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Gene Expression in Breast Cancer 

This application claims priority of U.S. Provisional Application No. 60/456,735, filed 
March 20, 2003, the disclosure of which is incorporated herein by reference in its entirety. 

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH 

OR DEVELOPMENT 

The research described in this application was supported in part by a grant (No. P50 
CA89393-01) and a National Research Service Award (No. 5F32 CA94788-02) from the 
National Cancer Institute of the National Institutes of Health and a grant (No. DAMD 17 01 1 
0221) from the Department of Defense. Thus the government has certain rights in the invention. 

TECHNICAL FIELD 

This invention relates to breast cancer, and more particularly to genes expressed in breast 
cancer cells. 

BACKGROUND 

Ductal carcinoma in situ (DCIS) of the breast includes a heterogeneous group of pre- 
invasive breast tumors with a wide range of invasive potential. In order to initiate early 
aggressive treatment where needed but to avoid such treatment, and its frequent harsh side 
effects, where not needed, it is important that methods to distinguish between DCIS and invasive 
breast cancer and between different types of DCIS be developed. 

SUMMARY 

The invention is based on the inventors' discovery of differing patterns of gene 
expression in breast cancer cells versus normal cells, in DCIS cells versus invasive and/or 
metastatic breast cancer cells, and between different grades of DCIS. The invention thus 
includes methods of diagnosis, methods of treatment, nucleic acids corresponding to newly 
identified genes* polypeptides encoded by such genes, and methods of screening for gene 
expression. 

More specifically, the invention features a method of diagnosis. The method includes the 
steps of: (a) providing a test sample of breast tissue; (b) determining the level of expression in 
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the test sample of a gene selected from those listed in Table 1; and (c) if the gene is expressed in 
the test sample at a lower level than in a control normal breast tissue sample, diagnosing the test 
sample as containing cancer cells. 

_ ._ The invention also provides a method of determining the grade of a ductal carcinoma in 
situ (DCIS). The method includes the steps of: (a) providing a test sample of DOS tissue; (b) 
deriving a test expression profile for the test sample by determining the level of expression in the 
test sample often or more genes selected from those listed in Tables 2-16; (c) comparing the test 
expression profile to control expression profiles of the ten or more genes in control samples of 
high grade, intermediate grade, and low grade DCIS; (d) selecting the control expression profile 
that most closely resembles the test expression profile; and (e) assigning to the test sample a 
grade that matches the grade of the control expression profile selected in step (d). The ten or 
more genes can be: 25 or more genes; 50 or more genes; 100 or more genes; 200 or more genes; 
500 or more genes. 

Another aspect of the invention is a method of determining the likelihood of a breast 
cancer being DCIS or invasive breast cancer. The method includes the steps of: (a) providing a 
test sample of breast tissue; (b) determining the level of expression in the test sample of a gene 
selected from the group consisting of a gene encoding CD74, a gene encoding MGC2328, a 
gene encoding S100A7, a gene encoding KRT19, a gene encoding trefoil factor 3 (TFF3), a gene 
encoding osteonectin, and a gene identified by a SAGE tag consisting of the nucleotide sequence 
CTGGGCGCCC; and (c) determining whether the level of expression of the selected gene in the 
test sample more closely resembles the level of expression of the selected gene in control cells of 
(i) DCIS or (ii) invasive breast cancer; and (d) classifying the test sample as: (i) likely to be 
DCIS if the level of expression of the gene in the test sample more closely resembles the level of 
expression of the gene in DCIS cells; or (ii) likely to be invasive breast cancer if the level of 
expression of the gene in the test sample more closely resembles the level of expression of the 
gene in invasive breast cancer cells. 

Also embraced by the invention is a method of predicting the prognosis of a breast cancer 
patient The.method includes the steps of: (a) providing a sample of primary invasive breast 
cancer tissue from a test patient; and (b) determining the level of expression in the sample of a 
gene encoding S 1 00 A7 or a gene encoding fatty acid synthase (FASN). A level of expression 
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higher than in a control sample of primary invasive breast carcinoma from a patient with a good 
prognosis is an indication that the prognosis of the test patient is poor. 

Another method of diagnosis includes the steps of: (a) providing a test sample of breast 
tissue comprising a test stromal cell; and (b) determining the level of expression in the stromal 
cell of a gene selected from those listed in Tables 7, 8 and 10, 15, and 16, the gene being one that 
is expressed in a cell of the same type as the test stromal cell at a substantially higher level when 
present in breast cancer tissue than when present in normal breast tissue; and (c) classifying the 
test sample as: (i) normal breast tissue if the level of expression of the gene in the test stromal 
cell is not substantially higher than a control level of expression for a cell of the same type as the 
test stromal cell in normal breast tissue; (ii) breast cancer tissue if the level of expression of the 
gene in the test stromal cell is substantially higher than a control level of expression for a cell of 
the same type as the test stromal cell in normal breast tissue. The stromal cells in the test sample 
and the standard samples can be leukocytes and the genes selected from those listed in Tables 7 
and 15, e.g., genes encoding, for example, interleukin-ip (ILip) or macrophage inhibitory 
protein la (MlPla). The stromal cells in the test sample and the standard samples can also be 
myoepithelial cells or myofibroblasts and the genes selected from those listed in Tables 8, 15, 
and 16, e.g., genes encoding cathepsins F, K, and L, MMP2, PRSS11, thrombospondin 2, 
SERPING1, cytostatin C, TIMP3, platelet-derived growth factor receptor p-like (PDGFRBL), a 
collagen, collagen triple helix repeat containing 1 (CTHRC1), CXCL12, or CXCL14. The 
stromal cells in the test sample and the standard samples can be endothelial cells and the genes - 
selected from those listed in Tables 10 and 15. Moreover, the stromal cells in the test sample and 
the standard samples can be fibroblasts and the genes selected from those listed in Table 15. 

Another feature of the invention is method of diagnosis that involves: (a) providing a test 
sample of breast tissue comprising a test stromal cell; and (b) determining the level of expression 
in the stromal cell of a gene selected from those listed in Tables 7, 8, 10, and 15, the gene being 
one that is expressed in a cell of the same type as the test stromal cell at a substantially higher 
level when present in normal breast tissue than when present in breast cancer tissue; and (c) 
Classifying the test sample as: (i) normal breast tissue if the level of expression of the gene in the 
test stromal cell is not substantially lower than a control level of expression for a cell of the same 
type as the test stromal cell in normal breast tissue; (ii) breast cancer tissue if the level of 
expression of the gene in the test stromal cell is substantially lower than a control level of 
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expression for a cell of the same type as the test stromal cell in normal breast tissue. The stromal 
cells in the test sample and the standard samples can be leukocytes and the genes selected from 
those listed in Tables 7 and 15. Alternatively, the stromal cells in the test sample and the 
standard samples can be myoepithelial cells or myofibroblasts and the genes selected from those 
listed in Tables 8 and 15. Furthermore, the stromal cells in the test sample and the standard 
samples can be endothelial cells and the genes can be selected from those listed in Tables 10 and 
15. In addition, the stromal cells in the test sample and the standard samples can be fibroblasts 
and the genes selected from those listed in Table 15. 

In another aspect, the invention provides a method of diagnosis that involves: 

(a) providing a test sample of breast tissue comprising a test epithelial cell of the luminal 
epithelial type; (b) determining the level of expression in the test epithelial cell of a gene selected 
from those listed in Tables 9 and 15, the gene being one that is expressed in cancerous epithelial 
cells of the luminal epithelial cell type at a substantially higher level than those in normal breast 
tissue; and (c) classifying the test sample as: (i) normal breast tissue if the level of expression of 
the gene in the test epithelial cell is not substantially higher than £ control level of expression for 
an epithelial cell of luminal epithelial cell type in normal breast tissue; (ii) breast cancer tissue if 
the level of expression of the gene in the test epithelial cell is substantially higher than a control 
level of expression for an epithelial cell of the luminal epithelial type in normal breast tissue. 

Also featured by the invention is a method of diagnosis that includes: (a) providing a test 
sample of breast tissue comprising a test epithelial cell of the luminal epithelial type; and 

(b) determining the level of expression in the test epithelial cell of a gene selected from those 
listed in Table 9, the gene being one that is expressed in epithelial cells of the luminal epithelial 
cell type at a substantially lower level when present in breast cancer tissue than when present in 
normal breast tissue; and (c) classifying the test sample as: (i) normal breast tissue if the level of 
expression of the gene in the test epithelial cell is not substantially lower than a control level of 
expression for an epithelial cell of luminal epithelial cell type in normal breast tissue; (ii) breast 
cancer tissue if the level of expression of this gene in the test epithelial cell is substantially lower 
than a control level of expression for an epithelial cell of the luminal epithelial type in normal 
breast tissue. 
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In all the above methods of the invention the level of expression of the gene can 
determined as a function of the level of protein encoded by the gene or as a function of the level 
of mRNA transcribed from the gene. 

Another embodiment of the invention is a method of inhibiting proliferation or survival 
of a breast cancer cell. The method involves contacting a breast cancer cell with a polypeptide 
that is encoded by a gene selected from those listed in Tables 1, 7-10, and 15, the gene being 
one that is expressed in the cancer cell, or a stromal cell in a tumor comprising the cancer cell, at 
a level substantially lower than in a normal cell of the same type. In the method, the cancer cell 
. can be in vitro. Alternatively, it can be in a mammal, e.g., a human. The contacting can include 
administering the polypeptide to the mammal or administering a polynucleotide encoding the 
polypeptide to the mammal. The method can also involve: (a) providing a recombinant cell that 
is the progeny of a cell obtained from the mammal and has been transfected or transformed ex 
vivo with a nucleic acid encoding the polypeptide; and (b) administering the recombinant cell to 
the mammal, so that the recombinant cell expresses the polypeptide in the mammal. 

Another feature of the invention is a method of inhibiting pathogenesis of a breast cancer 
cell or stromal cell in a tumor of a mammal. The method includes: (a) identifying a mammal 
with a breast cancer tumor; and (b) administering to the mammal an agent that inhibits binding of 
a polypeptide encoded by a gene selected from those listed in Tables 2-10, 15, and 16 to its 
receptor or ligand, the gene being one that is expressed in a breast cancer cell in the tumor, or in 
a stromal cell in the tumor, at a level substantially higher than in a corresponding cell in a non- 
cancerous breast. The polypeptide is a secreted polypeptide or a cell-surface polypeptide. The 
agent can be a non-agonist antibody that binds to the polypeptide, a soluble form of the receptor, 
or a non-agonist antibody that binds to the receptor or ligand. The polypeptide can be, for 
example, CXCL12 or CXCL14 and the receptor can. be, for example, CXCR4 or a receptor for 
CXCL14. 

Another aspect of the invention is a method of inhibiting expression of a gene in a cell. 
The method includes introducing into a target cell selected from the group consisting of (a) a 
breast cancer cell and (b) stromal cell in a tumor comprising a breast cancer cell, an agent that 
inhibits expression of a gene selected from those listed in Tables 2-10, 15, and 16, . the gene 
, being one that is expressed in the target cell at a level substantially higher than in a 
corresponding cell in normal breast tissue. The agent can be an antisense oligonucleotide that 
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. hybridizes to an mRNA transcribed from the gene. The introducing step can involve 
administration of the antisense oligonucleotide to the target cell. The introducing step comprises 
administering to the target cell a nucleic acid comprising a transcriptional regulatory element 
(TRE) operably linked to a nucleotide sequence complementary to the antisense oligonucleotide, 
wherein transcription of the nucleotide sequence inside the target cell produces the antisense 
oligonucleotide. The agent can also be an RNAi molecule, one strand of the RNAi molecule 
having the ability to hybridize to a mRNA transcribed from the gene. The agent can also be a 
small molecule that inhibits expression of the gene. The gene can be one that encodes, for 
example, can be, for example, CXCL12, CXCL14 , CXCR4, or a receptor for CXCL14. 

Also provided by the invention is an isolated DNA that includes: (a) the nucleotide 
sequence of a tag selected from those listed in Fig. 7; or (b) the complement of the nucleotide 
sequence. Also embraced by the invention is a vector containing the DNA. In the vector, the 
DNA can optionally be operatively linked to a transcriptional regulatory element (TRE). . A cell 
comprising any of the vectors of the invention is also an aspect of the invention. Also included 
in the invention is an isolated polypeptide encoded by the DNA of the invention. 

In another aspect, the invention embraces a single stranded nucleic acid probe that 
includes: (a) the nucleotide sequence of a tag selected from those listed in Tables 1-5, 7-10, 15, 
and 16; or (b) the complement of the nucleotide sequence. 

Also embodied by the invention is an array that includes a substrate having at least 10 
addresses, each address having disposed on it a capture probe that includes a nucleic acid 
sequence consisting of a tag nucleotide sequence selected from those listed in Tables 1-5, 7-10, 
15, and 16. The tag nucleotide sequence can be one that corresponds to a gene encoding a 
protein selected from the group consisting of fatty acid synthase (FASN), trefoil factor 3 (TFF3), 
X-box binding protein 1 (XBP1), interferon alpha inducible protein 6-16 (IFI-6-16), cysteine- 
rich protein 1 (CRIP1), interferon-stimulated protein 15 kDa (ISG15), interferon alpha inducible 
protein 27 (IFI27), brain expressed X linked 1 (BEX1), helicase/primase protein (LOC150678), 
anaphase promoting complex subunit 1 1 (ANAPC1 1), Fer-l-like 4 (FER1L4), psoriasin, 
connective tissue growth factor (CTGF), regulator of G-protein signaling 5 (RGS5), paternally 
expressed 10 (PEG10), osteonectin (SPARC), LOC51235, CD74, MGC23280, Invasive. Breast 
Cancer 1 (IBC-1), Apolipoprotein D (APOD), carboxypeptidase Bl (CPB1), retinal binding 
protein 1 (RBP1), FLJ30428, calmodulin-like skin protein (CLSP), nudix (NUDT8), 
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MGC14480, interleukin-ip (ILp), macrophage inhibitory protein la (MlPla), cathepsins F, K, 
and L, MMP2, PRSS1 1, thrombospondin 2, SERPING1, cytostatin C, TIMP3, platelet-derived 
growth factor receptor p-like (PDGFRBL), a collagen, collagen triple helix repeat containing 1 
(CTHRCl), CXCL12, CXCL14, and a protein encoded by a gene identified by a SAGE tag 
consisting of the nucleotide sequence CTGGGCGCCC. The array can contain at least 25 
addresses; at least 50 addresses; at least 100 addresses; at least 200 addresses; or at least 500 
addresses. 

The invention also features a kit comprising at least 10 probes, each probe including a 
nucleic acid sequence that includes a tag nucleotide sequence selected from those listed in Tables 
1-5, 7-10, 15, and 16. The kit can contain at least 25 probes; at least .50 probes; at least 100 
probes; at least 200 probes; at least 500 probes. 

Another kit provided by the invention is one that contains at least 10 antibodies each of 
which is specific for a different protein encoded by a gene identified by a tag selected from the 
group consisting of the tags listed in Tables 1-5, 7-10, 15, and. 16. The antibodies can, for 
example, be specific for a protein selected from the group consisting of fatty acid synthase 
(FASN), trefoil factor 3 (TFF3), X-box binding protein 1 (XBP1), interferon alpha inducible 
protein 6-16 (BF1-6-16), cysteine-rich protein 1 (CRIP1), interferon-stimulated proteinl5 kDa 
(ISG15), interferon alpha inducible protein 27 (IFI27), brain expressed X linked 1 (BEX1), 
helicase/primase protein (LOCI 50678), anaphase promoting complex subunit 1 1 (ANAPC11), 
Fer- 1 -like 4 (FER1 L4), psoriasiii, connective tissue growth factor (CTGF), regulator of G- 
protein signaling 5 (RGS5), paternally expressed 10 (PEG10), osteonectin (SPARC), LOC51235, 
CD74, MGC23280, Invasive Breast Cancer 1 (EBC-1), Apolipoprotein D (APOD), 
carboxypeptidase Bl (CPB1), retinal binding protein 1 (RBP1), FU30428, calmodulin-like skin 
protein (CLSP), nudix (NUDT8), MGC 14480, interleukin-1 p (ILp), macrophage inhibitory 
protein la (MEPla), cathepsins F, K, and L, MMP2, PRSS11, thrombospondin 2, SERPING1, 
cytostatin C, TIMP3, platelet-derived growth factor receptor p-like (PDGFRBL), a collagen, 
collagen triple helix repeat containing 1 (CTHRCl), CXCL1 2, CXCL14, and a protein encoded 
by a gene identified by a SAGE tag consisting of the nucleotide sequence CTGGGCGCCC . 
The kit can contain at least 25 antibodies; at least 50 antibodies; at least 100 antibodies; at least 
200 antibodies; or at least 500 antibodies. 
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In addition the invention provides a method of identifying the grade of a DCIS. The 
method involves: (a) providing a test sample of DGIS tissue; (b) using the above-described array 
to determine a test expression profile of the sample; (c) providing a plurality of reference 
profiles, each derived from a DCIS of a defined grade, the test expression profile and each 
reference profile having a plurality of values, each value representing the expression level of a 
gene corresponding to a tag selected from those listed in Tables 1-5, 7-10, 15, and 16; and 
(d) selecting the reference profile most similar to the test expression profile, to thereby identify 
the grade of the test DCIS. 

In another embodiment, the invention provides a method of determining whether a breast 
cancer is a DCIS or an invasive breast cancer. The method involves: (a) providing a test sample 
of breast cancer tissue; (b) determining the level of expression of CXCL14 in myofibroblasts in 
the test sample; (c) determining whether the level of expression of CXCL14 in the 
myofibroblasts in the test sample more closely resembles the level of expression of CXCL14 in 
control myofibroblasts of (i) DCIS or (ii) invasive breast cancer; and (d) classifying the test 
sample as: (i) DCIS if the level of expression of CXCL14 in myofibroblasts in the test sample 
more closely resembles the level of expression of CXCL14 in control myofibroblasts of DCIS; 
(ii) invasive breast cancer if the level of expression of CXCL14 in myofibroblasts in the test 
sample more closely resembles the level of expression of CXCL14 in control myofibroblasts of 
invasive breast cancer. 

Polypeptide" and "protein" are used interchangeably and mean any peptide-linked chain 
of amino acids, regardless of length or post-translational modification. 

The term "isolated" polypeptide or peptide fragment as used herein refers to a 
polypeptide or a peptide fragment which either has no naturally-occurring counterpart or has 
been separated or purified from components which naturally accompany it, e.g., in tissues such 
as pancreas, liver, spleen, ovary, testis, muscle, joint tissue, neural tissue, gastrointestinal tissue, 
or breast tissue or tumor tissue (e.g., breast cancer tissue), or body fluids such as blood, serum, 
or urine. Typically, the polypeptide or peptide fragment is considered "isolated" when it is at 
least 70%, by dry weight, free from the proteins and other naturally-occurring organic molecules 
with which it is naturally associated. Preferably, a preparation of a polypeptide (or peptide 
fragment thereof) of the invention is at least 80%* more preferably at least 90%, and most 
preferably at least 99%, by dry weight, the polypeptide (or the peptide fragment thereof), 
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respectively, of the invention. Since a polypeptide that is chemically synthesized is, by its 
nature, separated from the components that naturally accompany it, the synthetic polypeptide is 
"isolated." 

An isolated polypeptide (or peptide fragment) of the invention can be obtained, for 
example, by extraction from a natural source (e.g., from tissues or bodily fluids); by expression 
of a recombinant nucleic acid encoding the polypeptide; or by chemical synthesis. A 
polypeptide that is produced in a cellular system different from the source from which it 
naturally originates is "isolated," because it will necessarily be free of components which 
naturally accompany it. The degree of isolation or purity can be measured by any. appropriate 
method, e.g., column chromatography, polyacrylamide gel electrophoresis, or HPLC analysis. 

An "isolated DNA" is either (1) a DNA that contains sequence not identical to that of any 
naturally occurring sequence, or (2), in the context of a DNA with a naturally-occurring 
sequence (e.g., a cDNA or genomic DNA), a DNA free of at least one of the genes that flank the 
gene containing the DNA of interest in the genome of the organism in which the gene containing 
the DNA of interest naturally occurs. The term therefore includes a recombinant DNA 
incorporated into a vector; into an autonomously replicating plasmid or virus, or into the 
genomic DNA of a prokaryote or eukaryote. The term also includes a separate molecule such as: 
a cDNA where the corresponding genomic DNA has introns and therefore a different sequence; a 
genomic fragment that lacks at least one of the flanking genes; a fragment of cDNA or genomic 
DNA produced by polymerase chain reaction (PGR) and that lacks at least one of the flanking 
genes; a restriction fragment that lacks at least one of the flanking genes; a DNA encoding a non- 
naturally occurring protein such as a fusion protein, mutem, or fragment of a given protein; and a 
nucleic acid which is a degenerate variant of a cDNA or a naturally occurring nucleic acid. In 
addition, it includes a recombinant nucleotide sequence that is part of a hybrid gene, i.e., a gene 
encoding a non-naturaliy occurring fusion protein. It will be apparent from the foregoing that . 
isolated DNA does not mean a, DNA present among hundreds to millions of other DNA 
molecules, within, for example, cDNA or genomic DNA libraries or genomic DNA restriction 
digests in, for example, a restriction .digest reaction mixture or an electrophoretic gel slice. 

As used herein, a "functional fragment" of a polypeptide is a fragment of the polypeptide 
that is shorter than the fulHength, mature polypeptide and has at least 5% (e.g., at least: 5%; 
10%; 20%; 30%; 40%; 50%; 60%; 70%; 80%; 90%; 95%; 98%; 99%; 100%;. or more) of the 
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activity (e.g„ ability to inhibit proliferation of breast cancer cells) of the full-length, mature 
polypeptide. Fragments of interest can be made either by recombinant, synthetic, or proteolytic 
digestive methods. Such fragments can then be isolated and tested for their ability, for example, 
to inhibit the proliferation of cancer cells as measured by [ 3 H]-thymidine incorporation or cell 
counting. 

As used herein, "operably linked" means incorporated into a genetic construct so that 
expression control sequences effectively control expression of a coding sequence of interest. 

As used herein, the term "antibody" refers not only to whole antibody molecules, but also 
to antigen-binding fragments, e.g., Fab, F(ab')2, Fv, and single chain Fv (ScFv) fragments. Also 
included are chimeric antibodies. 

.As used herein, the term "pathogenesis" of a cell (e.g., a cancer cell or stromal cell within 
a tumor containing a cancer cell) means proliferation of a cell, survival of a cell, invasiveness of 
a cell, migratory potential of ia cell, metastatic potential of cell, ability of a cell to evade immune 
effector mechanisms, ability of a cell to induce or enhance angiogenesis, or ability of a cell to 
induce or enhance lymphangenesis. 

As used herein, a gene that is expressed at a "substantially higher level" in a first cell (or 
first issue) than in a second cell (or second tissue) is a gene that is expressed in the first cell (or 
tissue) at a level at least 2 (e.g., at least: 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 20; 30; 40; 50; 75; 100; 
200; 500; 1,000; 2000; 5,000; or 10,000) times higher than in the second cell (or second tissue). 

As used herein, a gene that is expressed at a "substantially lower level" in a first cell (or 
first issue) than in a second cell (or second tissue) is a gene that is expressed in the first cell (or 
tissue) at a level at least 2 (e.g., at least: 2; 3; 4; 5; 6; 7; 8; 9; 10; 15; 20; 30; 40; 50; 75; 100; 
200; 500; 1,000; 2000; 5,000; or 10,000) times lower than in the second cell (or second tissue). 

Unless otherwise defined, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art to which this invention . 
pertains. In case of conflict, the present document, including definitions, will control. Preferred 
methods and materials are described below, although methods and materials similar or equivalent 
to those described herein can be used in the practice or testing, of the present invention. All 
publications, patent applications, patents and other references mentioned herein are incorporated 
by reference in their entirety. The materials, methods, and examples disclosed herein are 
illustrative only and not intended to be limiting. ' 
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Other features and advantages of the invention, e.g., diagnosing breast cancer, will be 
apparent from the following description, from the drawings and from the claims. 

DESCRIPTION OF DRAWINGS 

Fig. 1 is diagrammatic representation of the antibody-based procedure used to purify 
5 epithelial and stromal cells from DCIS and normal breast tissue for the analysis described in 
Example 6. 

Fig. 2 is a series of photographs of ethidium bromide-stained electrophoretic gels of the 
products of RT-PCRs. The RT-PCR analysis was carried out on mRNA isolated from: 
(a) luminal epithelial cells ("epithelium"), myoepthelial cells ("myoepithelium"), leukocytes, and 
10 endothelial cells ("endothelium") purified from two DCIS tumor sample ("DCIS 6" and 

"DCIS7"); and (b) leukocytes and endothelial cells ("endothelium") from normal breast tissue 
("Normal"). The PCR phases of the RT-PCRs were carried out with oligonucleotide primers 
specific for two constitutively expressed genes (p-actin ("BAC") and L19) and for HER2 
(expressed by some breast cancers), CALLA (a myoepithelial cell marker), CD45 (a pan- 
1 5 leukocyte marker), and a cell surface protein specifically expressed by endothelial cells 

("CDH5"). The numbers at the bottom of each column of photographs ("25", "30", and "35") 
indicate numbers of PCR cycles. 

Fig. 3 A is a dendrogram showing the relatedness of SAGE libraries generated from 
normal mammary luminal epithelial cells (Nl and N2), DCIS cells (D1-D7 and T18), primary 
invasive breast cancer cells (11-16), breast cancer cells in lymph node metastases (LN1 and LN2), . 
and breast cancer cells in a distant lung metastasis (Ml) and analyzed by hierarchical clustering. 

Fig. 3B is a dendrogram showing similarities among intermediate and high grade DCIS 
tumor SAGE libraries analyzed by hierarchical clustering using 582 genes. 

Fig. 3C is a dendrogram showing similarities among intermediate and high grade DCIS 
tumor SAGE libraries analyzed by hierarchical clustering using 26 genes selected from the 582 
genes used for the analysis depicted in Fig. IB. 

Fig. 4A is a series of photomicrographs showing, the hybridization of riboprobes 
corresponding to genes encoding IFI-6-16, S100A7, CTGF, and RGS5 to frozen sections of 
DCIS tumors (T18, 96-33 1, 6164) and normal breast tissue (N24). Strong expression (indicated 
30 . by dark staining) of IFI-6-16 and S100A7 is detected in tumor cells of a subset of DCIS tumors 
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but not in normal breast tissue epithelial cells. Expression of CTGF and RGS5 is seen mostly in 
DCIS stromal fibroblasts and myoepithelial cells, respectively, but not in the corresponding cells 
in normal breast tissue. 

Fig. 4B is dendrogram showing the /elatedness of five normal breast tissues, and 18 
DCIS and invasive tumors analyzed for expression of 14 genes (SCGB3A1, TM4SF1, CTGF, 
XBP1, IFI27, ISG15, RGS5, RGS5, LOC150678, BEX1, PEG10, IFI-6-16, TFF3, CRffl, 
S100A7, and CTGF) by mRNA in situ hybridization. Numbers are specimen identifiers. "N" 
denotes normal breast tissue, "D" denotes DCIS tissue, and "I" denotes invasive breast cancer 
tissue. 

Fig. 4C is series of photomicrographs showing immunohistochemical staining of sections 
of a representative DCIS tumor in a tissue microarray. The tissue sections were stained with 
monoclonal antibodies specific for the indicated proteins. Dark staining indicates the presence of 
the protein. The data thus indicate the presence of SI 00 A7, TFF3, SPARC, and CTGF but 
absence of IB C-l in the DCIS tumor. 

Fig. 5 is diagrammatic representation of the antibody-based procedure used to purify 
epithelial and stromal cells from DCIS and normal breast tissue for the analysis described in 
Example 7. 

Fig. 6 A is a line graph depicting the results of a Scatchard analysis of alkaline phosphate 
(AP) conjugated CXCL14 (AP-CXCL14) binding to MDA-MB-231 breast cancer cells. 

Fig. 6B is a series of line graphs showing the effect of AP-CXCL14 (left and right panels) 
and CXCL12 (center panel) on the growth of MDA-MB-231 breast cancer cells (left and center 
panels) and MCF10A immortalized normal breast epithelial cells (right panel). 

Fig. 6C is a pair of bar graphs showing the ability of CXCL14 N-terminally conjugated 
with AP (AP-CXCL14), or C-terminally conjugated with AP (CXCL14-AP), to enhance 
migration (left panel) and invasion (right panel) of MDA-MB-23 1 breast cancer cells. The 
cultures containing the CXCL14 conjugates (and corresponding control cultures) were in serum- 
free medium. Data from control cultures carried out in medium containing 10% FBS and no 
CXCL14 conjugate are shown ("10% FBS"). . 

Fig. 7 is a depiction of the nucleotide sequences of SAGE tags that are listed in Tables 1- 
4, 7, 8, 10, and 15 and that correspond to no cDNA or mRNA nucleotide sequences present in the 
publicly available databases searched by the inventors. 

12 . 
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DETAILED DESCRIPTION 

Various aspects of the invention are described below. 

Nucleic Acid Molecules 

The nucleic acid molecules of the invention include those containing or consisting of the 
nucleotide sequences (or the complements thereof) of the SAGE (serial analysis of gene 
expression) tags listed in Fig. 7. The nucleic acid molecules of the invention can be cDNA, 
genomic DNA, synthetic DNA, or RNA, and can be double-stranded or single-stranded (i.e., 
either a sense or an antisense strand). Segments of these molecules are also considered within 
the scope of the invention, and can be produced by, for example, the polymerase chain reaction 
(PCR) or generated by treatment with one or more restriction endonucleases. A ribonucleic acid 
(RNA) molecule can be produced by in vitro transcription. Preferably, the nucleic acid 
molecules encode polypeptides that, regardless of length, are soluble under normal physiological 
conditions. 

The nucleic acid molecules of the invention can contain naturally occurring sequences, or 
sequences that differ from those that occur naturally, but, due to the degeneracy of the genetic 
code, encode the same polypeptide. In addition, these nucleic acid molecules are not limited to 
coding sequences, e.g., they can include some or all of the non-coding sequences that lie 
upstream or downstream from a coding sequence. They can also contain irrelevant sequences at 
their 5' and/or 3' ends (e.g., sequences derived from a vector). 

The nucleic acid molecules of the invention can be synthesized (for example, by 
phosphoramidite-based synthesis) or obtained from aibiological cell, such as the cell of a 
mammal. The nucleic acids can be those of a human, non-human primate (e.g., monkey), mouse, 
rat, guinea pig, cow, sheep, horse, pig, rabbit, dog, or cat. Combinations or modifications of the 
nucleotides within these types of nucleic acids are also encompassed. 

, In addition, the isolated nucleic acid molecules of the invention encompass segments that, 
are not found as such in the natural state. Thus, the invention encompasses recombinant nucleic 
acid molecules incorporated into a vector (for example, a plasmid or viral vector) or into the 
genome of a heterologous cell (or the genome of a homologous cell, at a position other than the 
natural chromosomal location). Recombinant nucleic acid molecules and uses therefor are 
discussed further below. 
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Techniques associated with detection or regulation of genes are well known to skilled 
artisans. Such techniques can be used to diagnose and/or treat disorders (e.g., DCIS or invasive 
cancer) associated with aberrant expression of the genes corresponding to the SAGE tags listed 
in Fig. 7. 

Family members of the genes or proteins or proteins of the invention can be identified 
based on their similarity to the relevant gene or protein, respectively. For example, the 
identification can be based on sequence identity. The invention features isolated nucleic acid 
molecules which are at least 50% (or at least: 55%; 65%; 75%; 85%; 95%; 98%; 99%; 99.5%; or 
even 100% ) identical to: (a) nucleic acid molecules that encode polypeptides encoded by genes 
corresponding to the SAGE tags listed in Fig. 7; (b) the nucleotide sequences of the coding 
regions of genes corresponding to the SAGE tags listed in Fig. 7; (c) nucleic acid molecules that 
include a segments of at least 30 (e.g., at least: 40; 50; 60; 80; 100; 125; 150; 175; 200; 250; 300; 
500; 700;1,000; 2,000; 3000; 5,000, 10,000; or more) nucleotides of the coding regions of genes 
corresponding to the SAGE tags listed in Fig. 7; and (d) nucleic acid molecules that include the 
genomic sequences of genes corresponding to the SAGE tags listed in Fig. 7; (e) nucleic acid 
molecules that include a segments of at least 30 (e.g., at least: 40; 50; 60; 80; 100; 125; 150; 175; 
200; 250; 300; 500; 700;1,000; 2,000; 3000; 5,000, 10,000; or more) nucleotides of the genomic 
sequences of genes listed corresponding to the SAGE tags listed in Fig. 7; (f) nucleic acid 
molecules containing or consisting of the SAGE tags listed in Fig. 7. 

The determination of percent identity between two sequences is accomplished using the 
mathematical algorithm ofKarlin and Altschul [(1990) Proc. Natl: Acad. Sci. USA 87:2264- 
2268] modified as in Karlin and Altschul [(1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877]. 
Such an algorithm is incorporated into the BLASTN and BLASTP programs of Altschul et al. 
[(1990) J. Mol. Biol. 215: 403-410], BLAST nucleotide searches are performed with the 
BLASTN program, score = 100, wordlength = 12, to obtain nucleotide sequences homologous to 
any of the nucleic acid molecules described herein. BLAST protein searches are performed with 
the BLASTP program, score = 50, wordlength = 3, to obtain amino acid sequences homologous 
to the polypeptides by encoded by any of the nucleic acid molecules described herein. To obtain 
gapped alignments for comparative purposes, Gapped BLAST is utilized as described in Altschul 
et al. [(1997) Nucleic Acids Res. 25:3389-3402]. When utilizing BLAST and Gapped BLAST 
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. programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) are 
. used. 

Hybridization can also be used as a measure of homology between two nucleic acid 
sequences. A nucleic acid sequence, or a portion thereof, can be used as a hybridization probe 
according to standard hybridization techniques. The hybridization of a nucleic acid probe 
specific for a target DNA or RNA of interest to DNA or RNA from a test source (e.g., a 
mammalian cell) is an indication of the presence of the target DNA or RNA in the test source. 
Hybridization conditions are known to those skilled in the art and can be found in Current 
Protocols in Molecular Biology, John Wiley & Sons, N.Y., 6.3.1-6.3.6, 1991. Moderate 
hybridization conditions are defined as equivalent to hybridization in 2 X sodium 
chloride/sodium citrate (SSC) at 30°C, followed by a wash in 1 X SSC, 0.1% SDS at 50°C. 
Highly stringent conditions are defined as equivalent to hybridization in 6 X sodium 
chloride/sodium citrate (SSC) at 45°C, followed by a wash in 0.2 X SSC, 0.1% SDS at 65°C. 

The invention also encompasses: (a) vectors (see below) that contain any of the 
foregoing coding sequences and/or their complements (that is, "antisense" sequences); 
(b) expression vectors that contain any of the foregoing coding sequences operably linked to any 
transcriptional/translational regulatory elements (examples of which are given below) necessary 
to direct expression of the coding sequences; (c) expression vectors encoding, in addition to a 
polypeptide encoded by any of the foregoing sequences, a sequence unrelated to the polypeptide, 
such as a reporter, a marker, or a signal peptide fused to the polypeptide; and (d) genetically 
engineered host cells (see below) that contain any of the foregoing expression vectors and 
thereby express the nucleic acid molecules of the invention. 

Recombinant nucleic acid molecules can contain a sequence encoding a polypeptide of 
the invention having a heterologous signal sequence. The full length polypeptide of the 
invention, or a fragment thereof, may be fused to such heterologous signal sequences or to 
additional polypeptides, as described below. Similarly, the nucleic acid molecules of the 
invention can encode the mature forms of the polypeptides of the invention or forms that include 
an exogenous polypeptide that facilitates secretion. 

The transcriptional/translational regulatory elements referred to above include but are not 
limited to inducible and non-inducible promoters, enhancers, operators and other elements that 
are known to. those skilled in the art and that drive of otherwise regulate gene expression. Such 
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regulatory elements include but are not limited to the cytomegalovirus hCMV immediate early 
gene, the early or late promoters of S V40 adenovirus, the lac system, the trp system, the TAC 
system, the TRC system, the major operator and promoter regions of phage A, the control 
regions of fd coat protein, the promoter for 3-phosphoglycerate kinase, the promoters of acid 
phosphatase, and the promoters of the yeast a-mating factors. 

Similarly, the nucleic acid can form part of a hybrid gene encoding additional 
polypeptide sequences, for example, a sequence that functions as a marker or reporter. Examples 
of marker and reporter genes include p-lactamase, chloramphenicol acetyltransferase (CAT), 
adenosine deaminase (ADA), aminoglycoside phosphotransferase (neo r , G418 1 ), dihydrofolate 
reductase (DHFR), hygromycin-B-phosphotransferase (HPH), thymidine kinase (TK), lacZ 
(encoding P-galactosidase), and xanthine guanine phosphoribosyltransferase (XGPRT). As with 
many of the standard procedures associated with the practice of the invention, skilled artisans 
will be aware of additional useful reagents, for example, additional sequences that can serve the 
function of a marker or reporter. Generally, the hybrid polypeptide will include a first portion 
and a second portion; the first portion being one of the proteins encoded by genes corresponding 
to the SAGE tags listed in Fig. 7 (or a functional fragment of such a protein) and the second 
portion being, for example, one of the reporters described above or an Ig constant region or part 
of an Ig constant region, e.g., the CH2 and CH3 domains of IgG2a heavy chain. Other hybrids 
could include an antigenic tag or His tag to facilitate purification. 

The expression systems that may be used for purposes of the invention include but are 
not limited to microorganisms such as bacteria (for example, E. coli and B: subtilis) transformed 
with recombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expression vectors 
containing the nucleic acid molecules of the invention; yeast (for example, Sdccharomyces and 
Pichia) transformed with recombinant yeast expression vectors containing the nucleic acid 
molecule of the invention; insect cell systems infected with recombinant virus expression vectors 
(for example, baculovirus) containing the nucleic acid molecule of the invention; plant cell 
systems infected.with recombinant virus expression vectors (for example, cauliflower mosaic 
virus (CaMV) or tobacco mosaic virus (TMV)) or transformed with recombinant plasmid expres- 
sion vectors (for example, Ti plasmid) containing any of the nucleotide sequences recited above; 
or mammalian cell systems (for example, COS, CHO, BHK, 293, VERO, HeLa, MDCK, WI38, 
and NIH 3T3 cells) harboring recombinant expression constructs containing promoters derived 
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from the genome of mammalian cells (for example, the metallothionein promoter) or from 
mammalian viruses (for example, the adenovirus late promoter and the vaccinia virus 7.5K 
promoter). Also useful as host cells are primary or secondary cells obtained directly from a 
mammal and transacted with a plasmid vector or. infected with a viral .vector. 

Polypeptides and Polypeptide Fragments 

The polypeptides of the invention include all those encoded by the nucleic acids 
described above and functional fragments of these polypeptides. The polypeptides embraced by 
the invention also include fusion proteins that contain either a full-length polypeptide, or a 
functional fragment thereof, fused to unrelated amino acid sequence. The unrelated sequences 
can be additional functional domains or signal peptides. The polypeptides can be any of those 
described above but with not more than 50 (e.g., not more than: 50; 40; 30; 25; 20; 15; 12, 10; 
nine; eight; seven; six; five; four; three; two; or one) conservative substitutions). Conservative 
substitutions typically include substitutions within the following groups: glycine and alanine; 
valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine, glutamine, serine and 
threonine; lysine, histidine and arginine; and phenylalanine and tyrosine. . All that is required of a 
polypeptide with one or more conservative substitutions is that it have at least 5% (e.g., at least: 
5%; 10%; 20%; 30%; 40%; 50%; 60%; 70%; 80%; 90%; 95%; 98%; 99%; 100%; or more) of 
the activity (e.g., ability to inhibit proliferation of breast cancer cells) of the relevant wild-type, 
mature polypeptide. 

Polypeptides of the invention and those useful for the invention can be purified from 
natural sources (e.g., blood, serum, plasma, tissues or cells such as normal breast or cancerous 
breast epithelial cells (of the luminal type), myoepithelial cells, leukocytes, or endothelial cells). 
Smaller peptides (less than 50 amino acids long) can also be conveniently synthesized by 
standard chemical means. In addition, both polypeptides and peptides can be produced by 
standard in vitro recombinant DNA techniques and in vivo transgenesis, using nucleotide 
sequences encoding the appropriate polypeptides or peptides. Methods well-known to those 
skilled in the art can be used to construct expression vectors containing relevant coding 
sequences and appropriate transcriptional/translational control signals. See, for example; the 
techniques described in Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd Ed.) 
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[Cold Spring Harbor Laboratory, N.Y., 1989], and Ausubel et al., Current Protocols in 
Molecular Biology [Green Publishing Associates and Wiley Interscience, N.Y;, 1989]. 

Polypeptides and fragments of the invention, and those useful for the invention, also 
include those described above, but modified for in vivo use by the addition, at the amino- and/or 
carboxyl-terminal ends, of a blocking agent to facilitate survival of the relevant polypeptide in 
vivo. This can be useful in those situations in.which the peptide termini tend to be degraded by 
proteases prior to cellular uptake. Such blocking agents can include, without limitation, 
additional related or unrelated peptide sequences that can be attached to the amino and/or 
carboxyl terminal residues of the peptide to be administered. This can be done either chemically 
during the synthesis of the peptide or by recombinant DNA technology by methods familiar to 
artisans of average skill. 

Alternatively, blocking agents such as pyroglutamic acid or other molecules known in the 
art can be attached to the amino and/or carboxyl terminal residues, or the amino group at the 
amino terminus or carboxyl group at the carboxyl terminus can be replaced with a different 
moiety. Likewise, the peptides can be covalently or noncovalently coupled to pharmaceutical^ 
acceptable "carrier" proteins prior to administration. 

Also of interest are peptidomimetic compounds that are designed based upon the amino 
acid sequences of the functional peptide fragments. Peptidomimetic compounds are synthetic 
compounds having a three-dimensional conformation (i.e., a "peptide motif) that is substantially 
the same as the three-dimensional conformation of a selected peptide. The peptide motif 
provides the peptidomimetic compound with the ability to inhibit the pathogenesis of breast 
cancer cells in a manner qualitatively identical to that of the functional fragment from which the 
peptidomimetic was derived. Peptidomimetic compounds can have additional characteristics 
that enhance their therapeutic utility, such as increased cell permeability and prolonged 
biological half-life. 

The peptidomimetics typically have a backbone that is partially or completely non- 
peptide, but with side groups that are identical to the side groups of the amino acid residues that 
occur in the peptide on which the peptidomimetic is based. Several types of chemical bonds, 
e.g., ester, thioester, thioamide, retroamide, reduced carbonyl, dimethylene and ketomethylene 
bonds, are known in the art to be generally useful substitutes for peptide bonds in the 
construction, of protease-resistant peptidomimetics. 
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In the sections below, a "gene X" represents any of the genes listed in Tables 1-16; 
mRNA transcribed from gene X is referred to as "mRNA X"; protein encoded by gene X is 
referred to as "protein X"; and cDNA produced from mRNA X is referred to as "cDNA X". It is 
understood that, unless otherwise stated, descriptions containing these terms are applicable to 
any of the genes listed in Tables 1-16, mRNAs transcribed from such genes, proteins encoded by 
such genes, or cDNAs produced from the mRNAs. 

Diagnostic assays 

The invention features diagnostic assays. Such assays are based on the findings that: 
(a) certain genes are expressed at a higher level, or a lower level, in breast epithelial cancer cells 
(or non-epithelial cells within a relevant breast tumor) compared to normal cells of the same 
types; and (b) breast cancers of various grades and/or stages differ from each other in terms of 
the patterns of genes they express and in the levels at which they express them. These findings 
provide the bases for assays to diagnose breast cancer and to define the grade and/or stage of a 
breast cancer. Such assays can be used on their own or, preferably, in conjunction with other 
procedures to diagnose breast cancer and/or identify the grade and/or stage of progression of a 
breast cancer. 

The diagnostic assays of the invention generally involve testing for levels of expression 
of one or a plurality of the genes listed in Tables 1-16. By testing for levels of expression in a 
cell of a plurality of genes, one obtains an "expression profile" of the cell. 

In the assays of the invention either: (1) the presence of protein X or mRNA X in cells is 

tested for or their levels in cells are measured; or (2) the level of protein X is measured in a 

i 

liquid sample such as a body fluid (e.g., urine, saliva, semen, blood, or serum or plasma derived 
from blood); a lavage such as a breast duct lavage, lung lavage, a gastric lavage, a rectal or 
colonic lavage, or a vaginal lavage; an aspirate such as a nipple aspirate; or a fluid such as a 
supernatant from a cell culture. In order to test for the presence, or measure the level, of mRNA 
X in cells, the cells can be lysed and total RNA can be purified or semi-purified from lysates by 
any of a variety of methods known in the art. Methods of detecting or measuring levels of 
particular mRNA transcripts are also familiar to those in the art. Such assays include, without 
limitation, hybridization assays using detectably labeled mRNA X-specific DNA or RNA probes 
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and quantitative or semi-quantitative RT-PCR methodologies employing appropriate mRNA X 
and cDNA X-specific oligonucleotide primers. Additional methods for quantitating mRNA in 
cell lysates include RNA protection assays and serial analysis of gene expression (SAGE). 
Alternatively, qualitative, quantitative, or semi-quantitative in situ hybridization assays can be 
carried out using, for example, tissue sections or unlysed cell suspensions, and detectably (e.g., 
fluorescently or enzyme) labeled DNA or RNA probes. 

Methods of detecting or measuring the levels of a protein of interest in cells are known 
in the art. Many such methods employ antibodies (e.g., polyclonal antibodies or monoclonal 
antibodies (mAbs)) that bind specifically to the protein. In such assays, the antibody itself or a 
secondary antibody that binds to it can be detectably labeled. Alternatively, the antibody can be 
conjugated with biotin, and detectably labeled avidin (a protein that binds to biotin) can be used 
to detect the presence of the biotinylated antibody. Combinations of these approaches (including 
"multi-layer" assays) familiar to those in the art can be used to enhance the sensitivity of assays. 
Some of these assays (e.g., immunohisto logical methods or fluorescence flow cytometry) can be 
applied to histological sections or unlysed cell suspensions. The methods described below for 
detecting protein X in a liquid sample can also be used to detect protein X in cell lysates. 

Methods of detecting protein X in a liquid sample (see above) basically involve 
contacting a sample of interest with an antibody that binds to protein X and testing for binding of 
the antibody to a component of the sample. In such assays the antibody need not be detectably 
labeled and can be used without a second antibody that binds to protein X. For example, by 
exploiting the phenomenon of surface plasmon resonance, an antibody specific for protein X 
bound to an appropriate solid substrate is exposed to the sample. Binding of protein X to the 
antibody on the solid substrate results in a change in the intensity of surface plasmon resonance 
that can: be detected qualitatively or quantitatively by an appropriate instrument, e.g., a Biacore 
apparatus (Biacore International AB, Rapsgatan, Sweden). 

Moreover, assays for detection of protein X in a liquid sample can involve the use, for 
example, of: (a) a single protein X-specific antibody that is detectably labeled; (b) an unlabeled 
protein X-specific antibody and a detectably labeled secondary antibody; or (c) a biotinylated 
protein X-specific antibody and detectably labeled avidin. In addition, as described above for 
detection of proteins in cells, combinations of these approaches (including "multi-layer" assays) 
familiar to those in the art can be used to enhance the sensitivity of assays. In these assays, the 
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sample or an (aliquot of the sample) suspected of containing protein X can be immobilized on a 
solid substrate such as a nylon or nitrocellulose membrane by, for example, "spotting" an aliquot 
of the liquid sample or by blotting of an electrophoretic gel on which the sample or an aliquot of 
the sample has been subjected to electrophoretic separation. The presence or amount of protein 
X on the solid substrate is then assayed using any of the above-described forms of the protein X- 
specific antibody and, where required, appropriate detectably labeled secondary antibodies or 
avidin. 

The invention also features "sandwich" assays. In these sandwich assays, instead of 
immobilizing samples on solid substrates by the methods described above, any protein X that 
may be present in a sample can be immobilized on the solid substrate by, prior to exposing the 
solid substrate to the sample, conjugating a second ("capture") protein X-specific antibody 
(polyclonal or mAb) to the solid substrate by any of a variety of methods known in the art. In 
exposing the samplfe to the solid substrate with the second protein X-specific antibody bound to 
it, any protein X in the sample (or sample aliquot) will bind to the second protein X-specific 
antibody on the solid substrate. The presence or amount of protein X bound to the conjugated 
second protein X-specific antibody is then assayed using a "detection" protein X-specific 
antibody by methods essentially the same as those described above using a single protein X- 
specific antibody. It is understood that in these sandwich assays, the capture antibody should not 
bind to the same epitope (or range of epitopes in the case of a polyclonal antibody) as the 
detection antibody. Thus, if a mAb is used as a capture antibody, the detection antibody can be 
either: (a) another mAb that binds to an epitope that is either completely physically separated 
from or only partially overlaps with the epitope to which the capture mAb binds; or (b) a 
polyclonal antibody that binds to epitopes other than or in addition to that to which the capture 
mAb binds. On the other hand, , if a polyclonal antibody is used as a capture antibody, the 
detection antibody can be either (a) a mAb that binds to an epitope to that is either completely 
physically separated from or partially overlaps with any of the epitopes to which the capture 
polyclonal antibody binds; or (b) a polyclonal antibody that binds to epitopes other than or in 
addition to that to which the capture polyclonal antibody binds. Assays which involve the used 
of a capture and detection antibody include sandwich ELISA assays, sandwich Western blotting 
assays, and sandwich immunomagnetic detection assays. 
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Suitable solid substrates to which the capture antibody can be bound include, without 
Imitation, the plastic bottoms and sides of wells of microliter plates, membranes such as nylon 
or nitrocellulose membranes, polymeric (e.g., without limitation, agarose, cellulose, or 
polyacrylamide) beads or particles. It is noted that protein X-specific antibodies bound to such 
beads or particles can also be used for immunoaffinity purification of protein X. 

Methods of detecting or for quantifying a detectable label depend on the nature of the 
label and are known in the art. Appropriate labels include, without limitation, radionuclides 
(e.g., 1, 131 t 35 S, 3 H, 32 P, 33 P, or 14 C), fluorescent moieties (e.g., fluorescein, rhodamine, or 
phycoerythrin), luminescent moieties (e.g., Qdot™ nanoparticles supplied by the Quantum Dot 
Corporation, Palo Alto, CA), compounds that absorb light of a defined wavelength, or enzymes 
(e.g., alkaline phosphatase or horseradish peroxidase). The products of reactions catalyzed by 
appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they 
may absorb visible or ultraviolet light. Examples of detectors include, without limitation, x-ray 
film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, 
fluorometers, luminometers, and densitometers. 

In assays, for example, to diagnose breast cancer, the level of protein X in, for example, 
serum (or a breast cell) from a patient suspected of having, or at risk of having, breast cancer is' 
compared to the level of protein X in sera (or breast cells) from a control subject (e.g., a subject 
not having breast cancer) or the mean level of protein X in sera (or breast cells) from a control 
group of subjects ( e.g., subjects not having breast cancer). A significantly higher level, or lower 
level (depending on whether the gene of interest is expressed at higher or lower level in breast 
cancer or associated stromal cells), of protein X in the serum (or breast cells) of the patient 
relative to the mean level in sera (or breast cells) of the control group would indicate that the 
patient has breast cancer. Alternatively, if a sample of the subject's serum (or breast cells) that 
was obtained at a prior date at which the patient clearly did not have breast cancer is available, 
the level of protein in the test serum (or breast cell) sample can be compared to the level in the' . 
prior obtained sample. A higher level, or lower level (depending on whether the gene of interest 
is expressed at higher or lower level in breast cancer or associated stromal cells) in the test serum 
(or breast cell) sample would be an indication that the patient has breast cancer. 

Moreover, a test expression profile of a gene in a test cell (or tissue) can be compared to 
control expressionprofiles of control cells (or tissues) previously established to be of defined 
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category (e.g., DCIS grade, breast cancer stage, or state of differentiation). The category of the 
the test cell (or tissue) will be that of the control cell (or tissue) whose expression profile the test 
cell's (or tissue's) expression profile most closely resembles. These expression profile 
comparison assays can be used to compare any of the normal breast tissue with any stage and/or 
grade of breast cancer recited herein and/or to compare between breast cancer grades and stages. 
The genes analyzed can be any of those listed in Tables 1-16 and the number of genes analyzed 
can .be any number, i.e. one or more. Generally, at least two (e.g., at least: two; three; four; five; 
six; seven; eight; nine; ten; 11; 12; 13; 14; 15; 17; 18; 20; 23; 25; 30; 35; 40; 45; 50; 60; 70; 80; 
90; 100; 120; 150; 200; 250; 300; 350; 400; 450; 500; or more) genes will be analyzed. It is 
understood that the genes analyzed will include at least one of those listed herein but can also 
include others not listed herein. 

One of skill in the art will appreciate from this description how similar "test level" versus 
"control level" comparisons can be made between other test and control samples described 
herein.. 

It is noted that the patients and control subjects referred to above need not be human 
patients. They can be for example, non-human primates (e.g., monkeys), horses, sheep, cattle, 
goats, pigs, dogs, guinea pigs, hamsters, rats, rabbits or mice. 

Methods of Inhibiting Expression of Genes 

Also included in the invention are methods of inhibiting expression of the genes listed in 
Tables 2-10, 15, and 16 in cells, e.g., breast epithelial cancer cells and/or stromal cells (e.g„ 
leukocytes, myoepithelial cells, myofibroblasts, endothelial cells, or fibroblasts) in a tumor 
containing the cancer cells; such methods are applicable where the expression of protein X in 
breast cancer cells, or stromal cells in a breast tumor, is higher than in corresponding normal 
cells. These methods can also be adapted to inhibit expression of a receptor for a ligand protein 
X. One such method involves introducing into a cell (a) an antisense oligonucleotide or (b) a 
nucleic acid comprising a transcriptional regulatory element (TRE) operably linked to a nucleic 
sequence that is transcribed in the cell into an antisense RNA. The antisense oligonucleotide and 
the antisense RNA hybridize to a mRNAX molecule (or mRNA molecule encodinga receptor 
for a ligand protein X) and have the effect in the cell of inhibiting expression of protein X (or 
receptor for protein X) in the cell. Inhibiting protein X/protein X receptor expression in the 
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breast cancer cells or stromal cells can inhibit pathogenesis of breast cancer cells. The method 
can thus be useful in inhibiting pathogenesis of a breast cancer cell and can be applied to the 
therapy of breast cancer, e.g., DCIS, invasive breast cancer, or metastatic breast cancer. 

Antisense compounds are generally used to interfere with protein expression either by, 
for example, interfering directly with translation of a target mRNA molecule, by RNAse-H- 
mediated degradation of the target mRNA, by interference with 5' capping of mRNA, by 
prevention of translation factor binding to the target mRNA by masking of the 5' cap, or by 
inhibiting of mRNA polyadenylation. The interference with protein expression arises from the 
hybridization of the antisense compound with its target mRNA. A specific targeting site or ^. 
target mRNA of interest for interaction with an antisense compound is chosen. Thus, for ~ 
example, for modulation of polyadenylation a preferred target site on an mRNA target is a 
polyadenylation signal or a polyadenylation site. For diminishing mRNA stability or 
degradation, destabilizing sequence are preferred target sites. Once one or more target sites have 
been identified, oligonucleotides are chosen which are sufficiently complementary to the target 
site (i.e., hybridize sufficiently well under physiological conditiojis and with sufficient 
specificity) to give the desired effect. 

With respect to this invention, the term "oligonucleotide" refers to an oligomer or 
polymer of RNA, DNA, or a mimetic of either. The term includes oligonucleotides composed of 
naturally-occurring nucleobases, sugars, and covalent internucleoside (backbone) linkages. The 
normal linkage or backbone of RNA and DNA is a 3' to 5' phosphodiester bond. The term also 
refers however to oligonucleotides composed entirely of, or having portions containing, non- 
naturally occurring components which function in a similar manner to the oligonucleotides 
containing only naturally-occurring components. Such modified substituted oligonucleotides are 
often preferred over native forms because of desirable properties such as, for example, enhanced 
cellular uptake, enhanced affinity for target sequence, and increased stability in the presence of 
nucleases. In the mimetics, the core base (pyrimidine or purine) structure is generally preserved 
but (1) the sugars are either modified or replaced with other components and/or (2) the inter- 
nucleobase linkages are modified. One class of nucleic acid mimetic that has proven to be very 
useful is referred to as protein nucleic acid (PNA). In PNA molecules the sugar backbone is 
replaced with an amide-containing backbone, in particular an aminoethylglycine backbone. The 
bases are retained and are bound directly to the aza nitrogen atoms of the amide portion of the 

24 



WO 2004/085621 



PCT7US2004/008866 



backbone. PNA and other mimetics useful in the instant invention are described in detail in U.S. 
Patent No. 6,210,289, which is incorporated herein by reference in its entirety. 

The antisehse oligomers to be used in the methods of the invention generally comprise 
about 8 to about 100 (e.g., about 14 to about 80_or about 14 to about 35) nucleobases (or 
nucleosides where the nucleobases are naturally occurring) . 

The antisense oligonucleotides can themselves be introduced into a cell or an expression 
vector containing a nucleic sequence (operably linked to a TRE) encoding the antisense 
oligonucleotide can be introduced into the cell. In the latter case, the oligonucleotide produced 
by the expression vector is an RNA oligonucleotide and the RNA oligonucleotide will be 
composed entirely of naturally occurring components. 

The methods of the invention can be in vitro or in vivo. In vitro applications of the 
methods can be useful, for example, in basic scientific studies on cancer cell pathogenesis, 
e.g., cancer ceil proliferation and/or cell survival. In such in vitro methods, appropriate cells (see 
above), can be incubated for various lengths of time with (a) the antisense oligonucleotides or 
(b) expression vectors containing nucleic acid sequences encoding the antisense oligonucleotides 
at a variety of concentrations. Other incubation conditions known to those in art 
(e.g., temperature or cell concentration) can also be varied. Inhibition of protein X expression 
can be. tested by methods known to those in the art. However, the methods of the invention will 
preferably be in vivo. 

As used herein, "prophylaxis" can mean complete prevention of the symptoms of a 
disease (e.g., breast cancer such as DCIS), a delay in onset of the symptoms of a disease, or a 
lessening in the severity of subsequently developed disease symptoms. "Prevention" should 
mean that symptoms of the disease (e.g., breast cancer) are essentially absent. As used herein, 
"therapy" can mean a complete abolishment of the symptoms of a disease or a decrease in the 
severity of the symptoms of the disease. As used herein, a "protective" regimen is a regimen that 
is prophylactic and/or therapeutic. 

The antisense methods are generally useful for cancer cells (e.g., a breast cancer cell) 
cancer cell pathogenesis-inhibiting therapy or prophylaxis. They can be administered to 
mammalian subjects (e.g., human breast cancer patients) alone or in conjunction with other drugs 
and/or radiotherapy. 
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Where antisense oligonucleotides perse are administered, they can be suspended in a 
pharmaceutically-acceptable carrier (e.g., physiological saline) and administered orally, 
intrarectally, intravaginally, intranasally, intragastrically, intratracheally, or intrapulmonarily, or 
injected subcutaneously, intramuscularly, intrathecally, intraperitoneal, intravenously. They 
can also be delivered directly to tumor cells, e.g., to a tumor or a tumor bed following surgical 
excision of the tumor, in order to kill any remaining tumor cells. The dosage required depends 
on the choice of the route of administration; the nature of the formulation; the nature of the 
patient's illness; the subject's size, weight, surface area, age, and sex; other drugs. being 
administered; and the judgment of the attending physician. Suitable dosages are generally in the 
range of 0.01 mg/kg - 100 mg/kg. Wide variations in the needed dosage are to be expected in 
view of the variety of compounds available and the differing efficiencies of various routes of 
administration. For example, oral administration would be expected to require higher dosages 
than administration by intravenous injection. Variations in these dosage levels can be adjusted 
using standard empirical routines for optimization as is well understood in the art. 
Administrations can be single or multiple (e.g., 2-, 3-, 4-, 6-, 8-, \0- 9 20-, 50-,100-, 150-, or more 
fold). Encapsulation of the polypeptide in a suitable delivery vehicle (e.g., polymeric 
microparticles or implantable devices) may increase the efficiency of delivery, particularly for 
oral delivery. 

Where an expression vector containing a nucleic sequence (operably linked to a TRE) 
encoding the antisense oligonucleotide is administered to a subject, expression of the coding 
sequence can be directed to any cell in the body of the subject. However, expression will 
preferably be directed to cells in a tumor containing the cancer cells or cells in the immediate 
vicinity of the cancer cells whose pathogenesis it is desired to inhibit. Expression of the coding 
sequence can be directed to the tumor cells themselves. This can be achieved by, for example, 
the use of polymeric, biodegradable microparticle or microcapsule delivery devices known in the 
art,^ 

Another way to achieve uptake of the nucleic acid is using liposomes, prepared by 
standard methods. The vectors can be incorporated alone into these delivery vehicles or co- 
incorporated with tissue-specific or tumor-specific antibodies. Alternatively, one can prepare a 
molecular conjugate composed of a plasmid or other vector attached to poly-L-lysine by 
electrostatic or covalent forces. Poly-L-lysine binds to a ligand that can bind to a receptor on 
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target cells [Cristiano et aL (1995), J. Mol. Med. 73:479]. Alternatively, tissue-specific targeting 
can be achieved by the use of tissue-specific transcriptional/translational regulatory, elements 
(TRE), e.g., promoters and enhancers, which are known in the art. Delivery of "naked DNA" 
(i.e., without a delivery vehicle) to an intramuscular, intradermal, or subcutaneous site is another 
means to achieve in vivo expression. 

Enhancers provide expression specificity in terms of time, location, and level. Unlike a 
promoter, an enhancer can function when located at variable distances from the transcription 
initiation site, provided a promoter is present. An enhancer can also be located downstream of 
the transcription initiation site. To bring a coding sequence under the control of a promoter, it is 
necessary to position the translation initiation site of the translational reading frame of the 
peptide or polypeptide between one and about fifty nucleotides downstream (3 f ) of the promoter. 
The coding sequence of the expression vector is operatively linked to a transcription terminating 
region. 

The transcriptional/translational regulatory elements referred to above include, but are 
not limited to, inducible and non-inducible promoters, enhancers, operators and other elements 
that are known to those skilled in the art and that drive or otherwise regulate gene expression. 
Examples of such regulatory elements are provided above in the section on Nucleic Acids. 

Suitable expression vectors include plasmids and viral vectors such as herpes viruses, 
retroviruses, vaccinia viruses, attenuated vacciniia viruses, canary pox viruses, adenoviruses and 
adeno-associated viruses, among others. 

Polynucleotides can be administered in a pharmaceutically acceptable carrier, 
Pharmaceutically acceptable carriers are biologically compatible vehicles that are suitable for 
administration to a human, e.g., physiological saline or liposomes. A therapeutically effective 
amount is an amount of the polynucleotide that is capable of producing a medically desirable 
result (e.g., decreased proliferation and or survival of breast cancer cells) in a treated animal. As 
is well known in the medical arts, the dosage for any one patient depends upon many factors, 
including the patient's size, body surface area, age, the particular compound to be administered, 
sex, time and route of administration, general health, and other drugs being administered 
concurrently. Dosages will vary, but a preferred dosage for administration of polynucleotide is 
from approximately 10 6 to approximately 10 12 copies of the polynucleotide molecule. This dose 



WO 2004/085621 



PCT/US2004/008866 



can be repeatedly administered* as needed. Routes of administration can be any of those listed 
above. 

Double-stranded interfering RNA (RNAi) homologous to mRNAX can also be used to 
reduce expression. of protein X in a cell See, e.g., Fire et al. (1998) Nature 391:806-811; 
Romano and Masino (1992) Mol. Microbiol. 6:3343-3353; Cogoni et al. (1996) EMBO J. 
15:3153-3163; Cogoni and Masino (1999) Nature 399:166-169; Misquifta and Paterson (1999) 
Proc. Natl. Acad. Sci. USA96:1451-1456; and Kennerdell and Carthew (1998) Cell 
95:1017-1026. 

The sense and anti-sense RNA strands of RNAi can be individually constructed using 
chemical synthesis and enzymatic ligation reactions using procedures known in the art. For 
example, each strand can be chemically synthesized using naturally occurring nucleotides or 
variously modified nucleotides designed to increase the biological stability of the molecule or to 
increase the physical stability of the duplex formed between the sense and anti-sense strands, 
e.g., phosphorothioate derivatives and acridine substituted nucleotides. The sense or anti-sense 
strand can also be produced biologically using an expression vector into which a target protein X 
sequence (full-length or a fragment) has been subcloned in a sense or anti-sense orientation. The 
sense and anti-sense RNA strands can be annealed in vitro before delivery of the dsRNA to any 
of cancer cells disclosed herein. Alternatively, annealing can occur in vivo after the sense and 
anti-sense strands are sequentially delivered to the cancer cells. 

Double-stranded RNA interference can also be achieved by introducing into cancer cells 
a polynucleotide from which sense and anti-sense RNAs can be transcribed under the direction 
of separate promoters, or a single RNA molecule containing both sense and anti-sense sequences 
can be transcribed under the direction of a single promoter. 

• ... 

Also useful for inhibiting expression of gene. X are "small molecule" inhibitors of gene 
expression. Such small. molecules are useful for inhibiting a function of protein X or a 
downstream activity initiated by or via protein X. For example, qiiinazoline compounds are 
useful in inhibiting tyrosine kinase activity that, for example, is stimulated by binding of a ligand 
to one of epidermal growth factor receptors (EGFR), e.g., erbBl pr erbB2. Small molecules of 
interest include, without limitation, small non-hucleic acid organic molecules, small inorganic 
molecules, peptides, peptdids, peptidomimetics, non-naturally occurring nucleotides, and small 
nucleic acids (e.g., RNAi or antisense oligonucleotides). Generally, small molecules have 



WO 2004/085621 



PCT/US2004/008866 



molecular weights of less than 10 kDa (e.g., less than: 10 kDa; 9 kDa; 8 kDa; 7 kDa; 6 kDa; 5 
kDa; 4 kDa; 3 kDa; 2 kDa; or 1 kDa). 

Other methods of interest include the recently described degrakine and intrakine 
-techniques [Coffield et al. (2003) Nat. Biotech. 21:1321-1327; Chen et at (1997) Nat. Med. 
3:11 10-11 16], which result in inhibition of expression, on the surface of a target cell (e.g., a 
breast cancer cell), of a receptor for a ligand protein (e.g., a soluble ligand such as a cytokine, 
chemokihe, or growth factor or a ligand on the surface of another cell). By inhibiting expression 
of the receptor on the target cell, responsiveness of the target cell to the ligand protein is 
inhibited or, optimally, prevented. 

In the degrakine methodology, a fusion protein is used to inhibit cell surface expression 
of a receptor for a ligand protein X of interest (e.g., a receptor for CXCL14), the receptor being 
on the surface of a target cell of interest (e.g., a breast cancer cell). The fusion protein is a fusion 
between (a) a ligand protein X (or a fragment of the protein X ligand that retains the ability to 
bind to the receptor for the protein X ligand) and (b) the HIV-l Vpu protein. The target cell of 
interest is contacted in vivo or in vitro with an expression vector (e.g., a viral vector such as any 
of those disclosed herein) expressing the fusion protein. After entry of the expression vector into 
the cell, the fusion protein is produced in the cytoplasm of the target cell. The fusion protein, 
due to the activity of the Vpu protein, then migrates to the endoplasmic reticulum (ER) of the 
target cell where it can bind to recently translated ligand protein X receptor molecules and inhibit 
or, optimally, prevent translocation of the receptor molecules to the surface of the target cell. 
Moreover, it is believed that the Vpu component of the fusion protein bound to newly made 
receptor molecules targets the receptor molecules for degradation by proteasomes within the 
target cell [Coffield et al. (2003)]. 

Intrakine methodologies are conceptually similar to the degrakine methodology. Instead 
of the Vpu protein, a signal sequence that serves to direct proteins containing it to the ER (e.g., 
the four amino acid KDBL (SEQ ID NO: 1956) sequence) is fused to the ligand protein X (or a 
fragment of the protein X ligand that retains the ability to bind to the receptor for the ligand 
protein X) [Coffield et al. (2003); Chen et al. (1997)] . , 

The degrakine and intrakine methodologies can be modified as follows. The fusion 
protein itself can be contacted (in vivo or in vitro) with a target cell expressing a surface receptor 
for this ligand protein X. The fusion protein can then, e.g., by.binding to such a receptor, enter 
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the cytoplasm of the target cell. The fusion protein then, as in the vector-mediated method 
described above, migrates to the ER of the target cell and inhibits translocation of the receptor to 
the target cell surface. 

One of skill in the art will appreciate that RNAi, small molecule, and degrakine/intrakine 
methods can be, as for the antisense methods described above, in vitro and in vivo. Moreover, 
methods and conditions of delivery for RNAi, small molecule, and degrakine/intrakine methods 
can be applied are the same as those for antisense oligonucleotides. 

The antisense, RNAi, small molecule, and degrakine/intrakine methods of the invention 
can be applied to a wide range of species, e.g., humans, non-human primates, horses, cattle, pigs, 
sheep, goats* dogs, cats, rabbits, guinea pigs, hamsters, rats, and mice. 

Passive Lnmunoprotection 

The methods described in this section are applicable where the expression of protein X in 
breast cancer cells, or stromal cells in a breast tumor, is higher than in corresponding normal 
cells. 

As used herein, "passive immunoprotection" means administration of one or more protein 
X-binding agents to a subject that has, is suspected of having, or is at risk of having a breast 
cancer, e.g., a DCIS, an invasive breast cancer, or a metastatic breast cancer. Thus, passive 
immunoprotection can be prophylactic and/or therapeutic. As used herein, "protein X-binding 
agents" are agents that bind to protein X and thereby inhibit the ability of protein X to enhance 
pathogenesis of breast cancer cells. It is understood that the term "inhibit" includes "completely 
inhibit" and "partially inhibit." Protein X-binding agents can be, for example, a soluble (i.e., 
not cell-bound) full length form (of fragment such as a fragment lacking a transmembrane 
domain) of a receptor for protein X (where protein X is a ligand), a soluble, non-agonist form (or 
fragment of a ligand for protein X (where protein X is a receptor), or a non-agonist, antibody 
specific for protein X. Other useful agents include non-agonist molecules that bind to a receptor 
for a protein X (i.e., protein X receptor-binding agents). Such protein X receptor-binding agents 
include non-agonist antibodies specific for a protein X receptor and non-agonist fragments of a 
protein X that retain the ability to bind to the receptor for protein X. A protein X-binding agent 
(or a protein X receptor-binding agent) useful for the invention has the capacity to inhibit the 
ability of protein X to enhance the pathogenesis (e.g., proliferation and/of survival) of the breast 
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cancer cells by at least 20% (e.g., at least: 20%; 30%; 40%; 50%; 60%; 70%; 80%; 90%; 95%; 
98%; 99%; 99:5%, or even 100%). 

Antibodies can be polyclonal or monoclonal antibodies; methods for producing both 
types of antibody are known in the art. The antibodies can be of any class (e.g., IgM, IgQ IgA, 
IgD, or IgE) and be generated in any of the species recited herein. They are preferably IgG 
antibodies. Recombinant antibodies, such as chimeric and humanized monoclonal antibodies 
comprising both human and non-human portions, can also be used in the methods of the 
invention. Such chimeric and humanized monoclonal antibodies can be produced by 
recombinant DNA techniques known in the; art, for example, using methods described in 
Robinson et al., International Patent Publication PCT/US86/02269; Akira et al., European Patent 
Application 1 84,187; Taniguchi, European Patent Application 171,496; Morrison et al!, European 
Patent Application 173,494; Neuberger et al., PCT Application WO 86/01533; Cabilly et al., U.S. 
Patent No. 4,816,567; Cabilly et al., European Patent Application 125,023; Better et al. (1988) 
Science 240, 1041-43; Liu et al. (1987) J. Immunol. 139, 3521-26; Sun et al. (1987) PNAS 84, 
214-18; Nishimura et al. (1987) Cane. Res. 47, 999-1005; Wood pt al. (1985) Nature 314, 446- 
49; Shaw et al. (1988) J. Natl. Cancer Inst. 80, 1553-59; Morrison, (1985) Science 229, 1202-07; 
Oi et al. (1986) BioTechniques 4, 214; Winter, U.S. Patent No. 5,225,539; Jones et al. (1986) 
Nature 321, 552-25; Veroeyan et al. (1988) Science 239, 1534; and Beidler et al. (1988) J. 
Immunol. 141,4053-60. " 

Also useful for the invention are antibody fragments and derivatives that contain at least 
the functional portion of the antigen-binding domain of an antibody. Antibody fragments that, 
contain the binding domain of the molecule can be generated by known techniques. Such 
fragments include, but are not limited to: F(ab f ) 2 fragments that can be produced by pepsin 
digestion of antibody molecules; . Fab fragments that can be generated by reducing the disulfide 
bridges of F(aV)2 fragments; and Fab fragments that can be generated by treating antibody 
molecules with papain and a reducing agent. See, e.g., National Institutes of Health, 1 Current 
Protocols In Immunology, Coligan et al., ed. 2.8, 2.10 (Wiley Interscience, 1991). Antibody 
fragments also include Fv fragments, i.e., antibody products in which there are few or no 
constant region amino acid residues. A single chain Fv fragment (scFv) is a single polypeptide 
chain that includes both the heavy and light chain variable regions of the antibody from which 
the scFv is derived. Such fragments can be produced, for example, as described in U.S. Patent 
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No. 4,642,334, which is incorporated herein by reference in its entirety. For a human subject, 
the antibody can be a "humanized" version of a monoclonal antibody originally generated in a 
different species. 

The invention includes antibodies specific for the proteins encoded by genes, 
corresponding to the SAGE tags listed in Fig. 7. The antibodies can be of any of the types and 
classed referred to herein. 

Protein X-binding (or protein X receptor-binding) agents can be administered to any of 
the species listed herein. The binding agents will preferably, but not necessarily, be of the same 
species as the subject to which they are administered. A single polyclonal or monoclonal 
antibody can be administered, or two or more (e.g., two, three, four, five, six, seven, eight, nine, 
ten, 12, 14, 16, 18, or 20) polyclonal antibodies or monoclonal antibodies can be given. The 
binding agents can be administered to subjects prior to, subsequently to, or at the same time as 
the protein X-expression inhibitors (see above). 

The dosage of protein X/protein X receptor-binding agents required depends on the route 
of administration, the nature of the formulation, the nature of the patient's illness, the -subject's 
size, weight, surface area, age, and sex, other drugs being administered, and the judgment of the 
attending physician. Suitable dosages are in the range of 0.01-100.0 mg/kg. The protein 
X/protein X receptor-binding agents can be administered by any of the routes disclosed herein, 
but will generally be administered intravenously, intramuscularly, or subcutaneously. Wide 
variations in the needed dosage are to be expected in view of the variety of protein X/protein X 
receptor-binding agents (e.g., protein X-specific antibodies) available and the differing 
efficiencies of various routes of administration. Variations in these dosage levels can be adjusted 
using standard empirical routines for optimization, as is well understood in the art. 
Administrations can be single or multiple (e.g., 2- or 3-, 4-, 6-, 8-, 10-, 20-, 50-, 100-, 150-, or 
more fold). 

Methods to test whether a compound or antibody is therapeutic for, or prophylactic 
against, a particular disease are known in the art. Where a therapeutic effect is being tested, a 
test population displaying symptoms of the disease (e.g., breast cancer such as DCIS) is treated 
with a protein X/protein X receptor expression inhibitor or protein X/protein X receptor-binding 
agent using any of the above-described strategies. A control population, also displaying 
symptoms of the disease, is treated, using the same methodology, with a placebo. Disappearance 
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or a decrease of the disease symptoms in the test subjects would indicate that the compound or 
antibody was an effective therapeutic agent. By applying the same strategies to subjects at risk 
of having the disease, the compounds and antibodies can be tested for efficacy as prophylactic 
agents. In this situation, prevention of or delay in onset of disease symptoms is tested. 

Methods of Inhibiting Pathogenesis of a Cancer Cell 

Such methods are applicable where the expression of protein X in breast cancer cells, or 
stromal cells in a breast tumor, is lower than in corresponding normal cells (see Tables 1, 3-10, 
and 15). These methods involve contacting a breast cancer cell with a protein X, or a functional 
fragment thereof, in order to inhibit pathogenesis (e.g., proliferation or survival) of the cancer 
cell. Such polypeptides or functional fragments can have amino acid sequences identical to 
wild-type sequences or they can contain not more than 50 (e.g., hot more than: 50; 40; 30; 25; 
20; 15; 12; 10; nine; eight; seven; six; five; four; three; two; or one) conservative amino acid 
substitution(s). Alleles of the polypeptides encoded by listed in Tables 1, 3-10, and 15 are also 
useful for the invention. 

The methods can be performed in vitro, in vivo, or ex vivo. In vitro application of protein 
X can be useful, for example, in basic scientific studies of tumor cell biology, e.g., studies on 
cancer ceil proliferation, survival, invasion, metastasis, or escape from immunological effector 
mechanisms or studies on angiogenesis. In addition, protein X and the polynucleotides encoding 
protein X (DNA and/or RNA) can be used as "positive controls" in diagnostic assays (see 
below). However, the methods of the invention will preferably be in vivo or ex vivo (see below). 

Protein X and variants thereof are generally useful as cancer cell (e.g., breast cancer cell) 
pathogenesis-inhibiting therapeutics. They can be administered to mammalian subjects (e.g., 
human breast cancer patients) alone or in conjunction with such drugs and/or radiotherapy. 

These methods of the invention can be applied to a wide range of species, e.g., humans, 
non-human primates, horses, cattle, pigs, sheep, goats, dogs, cats, rabbits, guinea pigs, hamsters, 
rats, and mice. 

In Vivo Approaches , 

In one in vivo approach, protein X (or a functional fragment thereof) itself is administered 
to the subject. Generally, the compounds of the invention will be suspended in a 
pharmapeutically-acceptable carrier (e.g., physiological saline) and administered orally or by 
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intravenous infusion, or injected subcutaneously, intramuscularly, intrathecally, intraperitoneally, 
intrarectally, intravaginally, intranasally, intragastrically, intratracheally, or intrapulmonariiy. 
They are preferably delivered directly to tumor cells, e.g., to a tumor or a tumor bed following 
_ surgical excision of the tumor, in order to kill any remaining tumor cells. The dosage required 
depends on the choice of the route of administration; the nature of the formulation; the nature of 
the patient's illness; the subject's size, weight, surface area, age, and sex; other drugs being . 
administered; and the judgment of the attending physician. Suitable dosages are in the range of 
0.01-100.0 ug/kg. Wide variations in the needed dosage are to be expected in view of the variety 
of polypeptides and fragments available and the differing efficiencies of various routes of 
administration. For example, oral administration would be expected to require higher dosages 
than administration by i.v. injection. Variations in these dosage levels can be adjusted using 
standard empirical routines for optimization as is well understood in the art. Administrations can 
be single or multiple (e.g., 2-, 3-, 4-, 6-, 8-, 10-, 20-, 50-,100-, 150-, or more fold). 
Encapsulation of the polypeptide in a suitable delivery vehicle (e.g., polymeric microparticles or 
implantable devices) may increase the efficiency of delivery, particularly for oral delivery. 

Alternatively, a polynucleotide containing a nucleic acid sequence encoding protein X or 
functional fragment thereof can be delivered to breast cancer cells in a mammal. Expression of 
the coding sequence will preferably be directed to lymphoid tissue of the subject by, for 
example, delivery of the polynucleotide to the lymphoid tissue. Expression of the coding 
sequence can be directed to any cell in the body of the subject. However, expression will 
preferably be directed to cells (e.g., stromal cells) in a tumor containing, or in the vicinity of, the 
cancer cells Whose proliferation it is desired to inhibit. In certain embodiments, expression of 
the coding sequence can be directed to the tumor cells themselves. This can be achieved by, for 
example, the use of polymeric, biodegradable microparticle or microcapsule delivery devices 
known in the art. 

Another way to achieve uptake of the nucleic acid is using liposomes (see section above 
on Methods of Inhibiting Expression of Genes)! 

hi the relevant polynucleotides (e.g., expression vectors), the nucleic acid sequence 
encoding protein X or functional fragment of interest with an initiator methionine and optionally 
a targeting sequence is operatively linked to a promoter or enhancer-promoter combination. 
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Short amino acid sequences can act as signals to direct proteins to specific intracellular 
compartments. Such signal sequences are described in detail in U.S. Patent No. 5,827,516, 
which is incorporated herein by reference in its entirety. 

Appropriate enhancers, vectors, and methods of administration of polynucleotides are 
described above in the section on Methods of Inhibiting Gene Expression. 
Ex Vivo Approaches 

An ex vivo strategy can involve transfecting or transducing cells obtained from the 
subject with a polynucleotide encoding protein X or functional fragment-encoding hucieic acid 
sequences described above. The transfected or transduced cells are then returned to the subject. 
The cells can be any of a wide range of types including, without limitation, hemopoietic cells 
(including leukocytes) (e.g., bone marrow cells, macrophages, monocytes, dendritic cells, T 
ceils, or B cells), fibroblasts, epithelial cells, endothelial cells, keratinocytes, or muscle cells. 
Such cells act as a source of the protein X or functional fragment for as long as they survive in 
the subject. Alternatively, tumor cells, preferably obtained from the subject but potentially from 
an individual other than the subject, can be transfected or transformed by a vector encoding a 
protein X or functional fragment thereof. The tumor cells, preferably treated with an agent (e.g., 
ionizing irradiation) that ablates their proliferative capacity, are then introduced into the patient, 
where they secrete exogenous protein X. 

The ex vivo methods include the steps of harvesting cells from a subject, culturing the 
cells, transducing them with an expression vector, and maintaining the cells under conditions 
suitable for expression of the protein polypeptide or functional fragment. These methods are 
known in the art of molecular biology. The transduction step is accomplished by any standard 
means used for ex vivo gene therapy, including calcium phosphate, lipofection, electroporation, 
viral infection, and biolistic gene transfer. Alternatively, liposomes or polymeric microparticles 
can be used. Cells that have been successfully transduced can then be selected, for example, for 
expression of the coding sequence or of a drug resistance gene. The cells may then be lethaily 
irradiated (if desired) and injected or implanted into the patient. 

Arrays and Uses Thereof 

The invention features an array that includes a substrate having a plurality of addresses. 
At least one address of the plurality includes a capture probe that binds specifically to a nucieic 
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acid X or aprotein X. The array can have a density of at least, or less than, 10, 20 5.0, 100, 200, 
500, 700, 1,000, 2,000, 5,000 or 10,000 or more addresses/cm 2 , and ranges between. Li a ' 
preferred embodiment, the plurality of addresses includes at least 10, 100, 500, 1,000, 5,000, 
10,000, 50,000 addresses. In a preferred embodiment, the plurality of addresses includes equal 
to or less than 10, 100, 500, 1,000, 5,000, 10,000, or 50,000 addresses. The substrate can be a 
two-dimensional substrate such as a glass slide, a wafer (e.g., silica or plastic), a mass 
spectroscopy plate, or a three-dimensional substrate such as a gel pad. Addresses in addition to 
address of the plurality can be disposed on the array. 

In one embodiment, at least one address of the plurality includes a nucleic acid capture 
probe that hybridizes specifically to a nucleic acid X, e.g., the sense or anti-sense strand. 
Nucleic acids of interest include, without limitation, all or part of any of the genes identified by 
the tags listed in Tables 1-16, all or part of mRNAs transcribed from such genes, or all or part of 
cDNA produced from such mRNA. Useful probes can, for example, be or contain the nucleotide 
sequences of the tags listed in Tables 1-5, 7-10, 15 and 16. Each address of the subset can 
include a capture probe that hybridizes, to a different region of a nucleic acid. Each address of 
the subset is unique, overlapping, and complementary to a different variant of gene X (e.g., an 
allelic variant, or all possible hypothetical variants). The array can be used to sequence gene X, 
mRNA X, or cDNA X by hybridization (see, e.g., U.S. Patent No. 5,695,940). 

An array can be generatedby any of a variety of methods. Appropriate methods include, 
e.g., photolithographic methods (see, e.g., U.S. Patent Nos. 5,143,854; 5,510,270; and 
5,527,681), mechanical methods (e.g., directed-flow methods as described in U.S. Patent No. 
5,384,261), pin-based methods (e.g., as described in U.S. Pat. No. 5,288,514), and bead-based 
techniques (e.g., as described in PCT US/93/04145).. 

i 

In another embodiment, at least one address of the plurality includes a polypeptide 
capture probe that binds specifically to protein X or fragment thereof. The polypeptide can be a 
naturally-occurring interaction partner of protein X, e.g., a ligand for protein X where protein X 
if a receptor or a receptor for protein X where protein X is ligand. Preferably, the polypeptide is 
an antibody, e.g., an antibody specific for protein X, such as a polyclonal antibody, a monoclonal 
antibody, or a single-chain antibody. 

In another aspect, the invention features a method of analyzing the expression of gene X. 
The method includes providing an array as described above; contacting the array with a sample . 
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and detecting binding of a nucleic acid X or protein X to the array. In one embodiment, the 
array is a nucleic acid array. Optionally the method further includes amplifying nucleic acid 
from the sample prior or during contact with the array. 

In another embodiment, the array can be used to assay gene expression in a tissue to 
ascertain tissue specificity of genes in the array, particularly the expression of gene X. If a 
sufficient number of diverse samples is analyzed, clustering (e.g., hierarchical clustering, k- 
means clustering, Bayesian clustering and the like) can be used to identify other genes which are 
co-regulated with gene X. For example, the array can be used for the quantitation of the 
expression of multiple genes. Thus, not only tissue specificity, but also the level of expression of 
a battery of genes in the tissue is ascertained. Quantitative data can be used to group (e.g., 
cluster) genes on the basis of their tissue expression perse and level of expression in that tissue. 

For example, array analysis of gene expression can be used to assess the effect of cell-cell 
interactions on gene X expression. A first tissue can be perturbed and nucleic acid from a second 
tissue that interacts with the first tissue can be analyzed. In this context, the effect of one cell 
type on another cell type in response to a biological stimulus can be determined, e.g., to monitor 
the effect of cell-cell interaction at the level of gene expression. 

Moreover, cells can be contacted with a therapeutic agent. The expression profile of the 
cells is determined using the array, and the expression profile is compared to the profile of like 
cells not contacted with the agent. For example, the assay can be used to determine or analyze 
the molecular basis of an undesirable effect of the therapeutic agent. If an agent is administered 
therapeutically to treat one cell type but has an undesirable effect on another cell type, the 
invention provides an assay to determine the molecular basis of the undesirable effect and thus 
provides the opportunity to co-administer a counteracting agent or otherwise treat the undesired 
effect. Similarly, even within a single cell type, undesirable biological effects can be determined 
at the molecular level. Thus, the effects of an agent on expression of other than the target gene 
can be ascertained and counteracted. 

In another embodiment, the array can be used to monitor expression of one or more genes 
in the array with respect to time. For example, samples obtained from different time points can 
be probed with the array. Such analysis can identify and/or characterize the development of a 
gene X-associated disease or disorder (e.g., breast cancer such as invasive breast cancer); and 
processes, such as a cellular transformation associated with a gene X-associated disease or 



WO 2004/085621 



PCT/US2004/008866 



disorder. The method can also evaluate the treatment and/or progression of a gene X-associated 
disease or.disorder 

The array is also useful for ascertaining differential expression patterns of one or more 
genes in normal and abnormal (e.g., malignant) cells. This provides a battery of genes (e.g.,. 
including gene X) that could serve as a molecular target for diagnosis or therapeutic intervention. 

In another aspect, the invention features an array having a plurality of addresses, Each 
address of the plurality includes a unique polypeptide. At least one address of the plurality has 
disposed thereon a protein or fragment thereof. Methods of producing polypeptide arrays are 
described in the art [ e.g., in De Wildt et al. (2000) Nature Biotech. 18:989-994; Lueking et ai. 
(1999) Anal, Biochem. 270:103.1 1 1; Ge, H. (2000) Nucleic Acids Res. 28 e3:I-VII; MacBeath, 
G, andSchreiber, S.L. (2000) Science 289:1760-1763; and WO 99/51773A1]. In a preferred 
embodiment, each addresses of the plurality has disposed thereon a polypeptide at least 60, 70, 
80, 85, 90, 95, or 99 % identical to protein X or fragment thereof. For example, multiple variants 
of protein X (e.g., encoded by allelic variants, site-directed mutants, random mutants, or 
combinatorial mutants) can be disposed at individual addresses of the plurality. Addresses in 
addition to the address of the plurality can be disposed on the array. 

The polypeptide array can be used to detect a protein X-binding compound, e.g., an 
antibody in a sample from a subject with specificity for protein X or the presence of a protein X- 

binding protein or ligand. * 

The array is also useful for ascertaining the effect of the expression of a gene on the 

expression of other genes in the same cell or in different cells (e.g., ascertaining the effect of . 

gene X expression on the expression of other genes). This provides, for example, for a selection . 

of alternate molecular targets for therapeutic intervention if the ultimate or downstream target 

cannot be regulated. 

In another aspect, the invention features a method of analyzing a plurality of probes. The 
method is useful, e.g., for analyzing gene expression. The method includes: providing a first 
two dimensional array having a plurality of addresses, each address (of the plurality) being 
positionally distinguishable from.each other address (of the plurality) having a unique capture 
probe, e.g., wherein the capture probes are from a cell or subject which express gene X or from a 
cell or subject in which a gene X-mediated response has been elicited, e.g., by contact of the cell 
with nucleic acid X or protein X, or administratiori to the cell or subject of a nucleic acid X or 
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protein X; providing a second two dimensional array having a plurality of addresses, each 
address of the plurality being positionally distinguishable from each other address of the 
plurality, and each address of the plurality having a unique capture probe, e.g., wherein the 
capture probes are from a cell or subject which does not express gene X (or does not express as 
highly as in the case of the cell or subject described aboye for the first array) or from a cell or 
subject which in which a gene X-mediated response has not been elicited (or has been elicited to 
a lesser extent than in the first sample); contacting the, first and second arrays with one or more 
inquiry probes (which are preferably other than a nucleic acid X, protein X, or antibody specific 
for protein X), and thereby evaluating the plurality of capture probes. Binding, e.g., in the case 
of a nucleic acid, hybridization with a capture probe at an address of the plurality, is detected, 
e.g., by signal generated from a label attached to the nucleic acid, polypeptide, or antibody. 

The invention also features a method of analyzing a plurality of probes or a sample. The 
method is useful, e.g., for analyzing gene expression. The method includes: providing a first two 
dimensional array having a plurality of addresses, each address of the plurality being positionally 
distinguishable from each other address of the plurality having a unique capture probe, 
contacting the array with a first sample from a cell or subject which express or mis-express gene 
X or from a cell or subject in which a gene X-mediated response has been elicited, e.g.* by 
contact of the cell with nucleic acid X or protein X, or administration to the cell or subject of 
nucleic acid X or protein X; providing a second two dimensional array having a plurality of 
addresses, each address of the plurality being positionally distinguishable from each other 
address of the plurality, and each address of the plurality having a unique capture probe, and 
contacting the array with a second sample from a cell or subject which does not express gene X 
(or does not express as highly as in the case of the as in the case of the cell or subject described 
for the first array) or from a cell or subject which in which a gene X-mediated response has not 
been elicited (or has been elicited to a lesser extent than in the first sample); and comparing the 
binding of the first sample with the binding of the second sample. Binding, e.g. , in the case of a 
nucleic acid, hybridization with a capture probe at an address of the plurality, is detected, e.g., by 
a signal generated from a label attached to the nucleic acid, polypeptide, or antibody. The same 
array can be used for both samples or different arrays can be used. If different arrays are used 
the same plurality of addresses with capture probes should be present on both arrays. 
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In another aspect, the invention features a method of analyzing gene X, e.g., analyzing 
the structure, function, or relatedness to other nucleic acids or amino acid sequences. The 
method includes: providing a nucleic acid X or protein X amino acid sequence; comparing the 
nucleic acid or amino acid sequence with one or more sequences from a collection of sequences,, 
e.g., a nucleic acid or protein sequence database; to thereby analyze gene X. 

The following examples are meant to illustrate, not limit, the invention. .. 

EXAMPLES 

Example 1 . Methods and Materials 
Tissue samples and tissue microarrays (IMA) 

All human tissue was collected following NIH guidelines and using protocols approved 
by the Institutional Review Boards of relevant institutions (see below). 

Fresh tissue specimens obtained from the Brigham and Women's Hospital, 
Massachusetts General Hospital, and Faulkner Hospital (all Bostbn, MA), Duke University 
(Durham, NC), University Hospital Zagreb (Zagreb, Croatia), and the National Disease Research 
Interchange (Philadelphia, PA) were snap frozen on dry ice and stored at -80°C until use. 
Tumors with significant DCIS components were identified based on pathology reports and 
confirmed by microscopic examination of hernatoxylin-eosin stained frozen sections. Of the 
tumors used for SAGE analysis, Dl, D3, D4, D5 and D6 were high-grade, comedo DCIS, and 
D2, D7 and T18 were intermediate-grade DCIS with no necrosis. Tumors used for mRNA in 
situ hybridization and immunohistochemistry included DCIS tumors of all three (low, 
intermediate, and high grade) histologic types. Most of the tumors used for in situ hybridization 
and immunohistochemistry were DCIS with concurrent invasive carcinoma and pure DCIS (i.e., 
without concurrent invasive carcinoma), respectively. Tumors D3 and D6 used for SAGE were 
pure DCIS. The larger representation of frozen/fresh DCIS tumors with concurrent invasive 
disease was due to logistic issues; it is extremely difficult to obtain frozen or fresh pure DCIS 
specimens, especially ones with long term clinical follow up data.' For in situ, hybridization, 
5 pm thick frozen sections were mounted on silylated slides (CEL Associates Inc, Pearland, TX), 
air dried, and stored at -80°C until use! 
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Tissue microarrays (TMAs) were: (1) obtained from commercial sources (Imgenex, San 
Diego, CA (49 invasive breast tumors); Ambion, Austin, TX (92 primary invasive tumors and 41 
distant metastases)); (2) provided by the Cooperative Breast Cancer Tissue Resource, Rockville, 
MD (40 normal breast tissue samples, 10 pure DCIS tumors, 10 DCIS with concurrent invasive 
tumors, and 192 primary invasive breast tumors); (3) generated at Johns Hopkins University, 
Baltimore, MD (299 invasive breast tumors and 10 distant metastases) and at Beth Israel 
Deaconess Medical Center (30 invasive breast tumors and 70 pure DCIS tumors of different 
histologic grades, all with matched normal breast tissue) following published protocols 
[Kononen et al. (1998) Nat. Med. 4:844-847]. With the exception of the Imgenex and the DCIS 
arrays (1 mm punches), all TMAs contained 0.6 mm punches, with at least 2 punches/tumor in 
order to control for tumor and immunohistochemical staining heterogeneity. 

Cell lines 

Breast cancer cell lines were obtained from American Type Culture Collection (ATCC; 
Manassas, VA) or were generously provided by Drs. Steve Ethier (University of Michigan) and 
Arthur Pardee (Dana-Farber Cancer Institute). Cells were grown in media recommended by the 
provider. 

Generation and analysis of SAGE libraries from normal and malignant breast tissue 

SAGE libraries were generated from DCIS tumors and normal breast tissue and analyzed 
essentially as previously described as part of the National Cancer Institute Cancer Gene 
Anatomy Project [Porter et al. (2001) Cancer Res. 61:5697-5702; Krop et al. (2001) Proc. Natl. 
Acad. Sci. USA 98:9796-9801; Lai et al. (1999) Cancer Res. 59:5403-5407; and Boon et al. 
(2002) Proc. Natl. Acad. Sci. U.S.A. 99: 1 1287-1 1292]. Two of the DCIS tumors were pure 
DCIS (D3 and D6) and the others were obtained from patients with concurrent invasive breast 
carcinomas. Epithelial cells from normal breast tissue (Nl and N2) and some tumors (D2, D3, 
D6, and D7) were purified using epithelial cell-specific monoclonal antibody (BerEP4)-coated 
magnetic beads (pynal, Oslo, Norway); other tumors were macroscopically dissected based on 
adjacent hematoxylin-eosin stained slides. Approximately 50,000 SAGE tags were obtained 
from each library. For further analyses libraries were normalized to the library with the highest 
tag number (89,541 total tags). Hierarchical clustering was applied to data using the Cluster 
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program developed by Eisen et al. [Eisen et al. (1998) 95:14863-14868]. Differentially 
expressed genes were identified based on statistical analysis of comparisons of groups of normal 
(2 samples), DCIS (8 samples), and invasive breast cancer (9 samples) SAGE libraries using the 
S AGE2000 software [ Velculescu et al. (1 995) Science 270:484-487] . Similarly for the 
identification of genes specifically expressed in DCIS or.invasive breast cancer, the 8 DCIS 
samples were treated as a group and the 9 invasive or metastatic patients were treated as another 
group. First, the SAGE tag numbers highest in two normal libraries (Nl and N2) were used as 
the cut-off and tag numbers in the DCIS and invasive libraries above this "normal" value were 
calculated using a two-sided Fisher-exact test without multiple comparisons (see Table 4). In a 
second test, ROC (receiver operating characteristic) curve analysis was used to choose the "best" 
cut-offfor values (Table 4). A ROC area of 0.50 is no better than chance and a ROC area of 
1 .00 is the best possible. 

mRNA in situ hybridization 

To generate templates for in vitro transcription reactions, 300-500 base pair fragments 
derived from the 3' untranslated region of the selected genes were PCR amplified and subcioned 
into the pZERO 1.0 expression vector (Invitrogen, Carlsbad, CA). pZERO 1.0 contains a 
multiple cloning site bounded by SP6 and T7 RNA polymerase promoters; therefore the same 
plasmid can be used for the generation of sense and anti-sense riboprobes for mRNA in situ 
hybridizations. Digitonin-labeled sense and anti-sense riboprobes were generated and mRNA in 
situ hybridization was performed as described [Qian et al. (2001) Genes Dev. 15:2533-2545; 
Porter et al. (2003a) Mol. Cancer Res. 1 :362-375]. The hybridized sections were observed with 
a NIKON microscope, images were obtained using a SPOT CCD camera, and the images were 
processed with the Adobe (San Jose, CA) Photoshop program. Hybridizations were considered 
successful if the control sense probe gave no significant signal. The intensity and distribution of 
the hybridization signal were scored (0-3 for intensity and 0-3 for distribution using the scoring 
scheme described below for immunohistochemlstry) independently by three investigators. 

Immunohistochemistry 

The expression of the indicated genes in primary breast tumors was determined by 
immunohistochemical analysis of eight tissue microarrays that contained evaiuatable paraffin- 
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embedded specimens derived from 80 DCIS, 675 primary invasive breast cancer, and 33 distant 
metastases. Antigen Retrieval Citra solution (Research Genetics, San Ramon, CA) and boiling in 
a microwave oven (5 minutes at high power) were used to enhance staining. Isotype control 
serum was used for negative control samples, A standard indirect immunoperoxidase protocol 
with 3,3 ? -diaminobenzidine as chromogen was used for the visualization of antibody binding 
(ABC-Elite; Vector Laboratories, Burlingame, CA). 

Primary antibodies used were as follows: mouse monoclonal antibody specific for human 
psoriasin ("anti-psoriasin") [Enerback et al. (2002) Cancer Res. 62:43-47]; affinity-purified 
rabbit polyclonal antibody specific for human Connective Tissue Growth Factor (CTGF) ("anti- 
CTGF") (a generous gift of Dr. D. Brigstock, Childrens' Research Institute, Colombus, OH); 
affinity-purified rabbit polyclonal antibody specific for human Trefoil Factor 3 (TFF3) ("anti- 
TFF3") (a kind gift of Prof. Hoffman, Universitaetsklinikum, Magdeburg, Germany); mouse 
monoclonal antibodies specific for human interleukin-8 (IL-8) ("anti-IL-8")> GRO-1 ("anti- 
GRO-1"), and GRO-2 ("anti-GRO-2") (R&D Systems, Minneapolis, MN); monoclonal antibody 
specific for human osteonectin (SPARC) ("anti-SPARC") (Hematologic Technologies, Essex 
Junction, VT); and monoclonal antibody specific for human fatty acid synthase (FASN) ("anti- 
FASN") (Transduction Labs. San Diego, CA). Mouse monoclonal antibodies specific for 
interleukin-lp (ILip) and CCL3 (chemokine (CC motif) iigand 3, also known as macrophage 
inhibitory protein la (MlPla)) were purchased from R&D (Minneapolis, MN) while anti-CD45 
mouse monoclonal antibody was obtained from DAKO (Carpinteria, CA). Antibodies were used 
at a 1:100 dilution in PBS (phosphate buffered saline) containing 10% heat-inactivated goat 
serum. 

Antibody staining was subjectively scored by three investigators independently on a scale 
of 0-3 for intensity (0=no staining, l=faint signal, 2=moderate and 3=intense staining) and 0-3 
for extent (0=no, 1=<30%, 2=30-70%, and 3= >70% positive cells) of staining. Cumulative 
scores wpre obtained by adding the average intensity and extent scores assigned by the three 
independent observers. For statistical analyses a cumulative score at or above 3 was considered 
positive. Relationships between the expression of genes determined by mRNA in situ 
hybridization or immunoWstochemistry were analyzed by Fishers exact test without correction 
for multiple comparisons. 
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Statistical analyses of clinical correlates 

The relationship of gene expression to clinico-pathologic parameters and the association 
between the expression of different genes determined by immunohistochemistry were analyzed 
by the following statistical methods. 

The eight individual tissue microarray datasets and a combined dataset were analyzed for 
association of gene expression positivity and prognostic factors using a logistic regression model 
(with gene expression positivity as the outcome), and a forward, or step-up, selection procedure 
to determine the best fitting model. Clinico-pathologic factors analyzed were: expression of the 
estrogen and progesterone receptors and HER2 by immunohistochemistry, histologic grade, 
TNM (tumor, node metastasis) stage, tumor size, number of positive lymph nodes, patient age, 
and overall and distant metastasis-free survival. If all patients or no patients with a particular 
level of a cbvariate demonstrated gene expression positivity, then the logistic regression did not 
converge and a significance level was obtained using Fisher's exact test. If, however, there 
remained some patients with and without gene expression positivity after deleting patients with 
the particular level of the covariate, then a step-up logistic regression was performed on them. 
The significance of the variables in the logistic regression models was tested using likelihood 
ratio tests. The cut-off used for entry into the model was a=0.05. In addition to the analyses 
described above, Kaplan-Meier curves were generated and Cox models were run for two datasets 
that contained survival information. Calculated times to distant failure and times to survival 
were used and were based on the failure/death and accession dates. 

Generation of SAGE libraries from epithelial and non-epithelial cells of normal breast and 
DCIStissue 

The procedure described in this section was used to obtain the data described in 
Example 6. 

Some of the cell types present in normal and cancerous breast tissue comprise a minor 

fraction (a few percent) of all cells of the relevant tissue; thus, genes that are specifically 

expressed in such cell types may not be detected by analysis of the whole tissue. In order to 

analyze the comprehensive gene expression profiles of purified luminal epithelial cells, 

myoepithelial cells, endothelial cells, fibroblasts and teukbcytes isolated from normal breast 

tissue and breast carcinomas using SAGE, a purification procedure that allows the isolation of 

pure cell populations was developed. A brief outline of the procedure is depicted in Fig. J. In 
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order to isolate specific cell types, antibodies specific for cell type-specific cell surface markers 
and magnetic beads were employed using well-established methods. Thus, luminal mammary 
epithelial cells were isolated using the BerEp4 monoclonal antibody, myoepithelial cells with a 
monoclonal antibody specific for CD 1 0/Calla, infiltrating leukocytes with a monoclonal 
antibody specific for the CD45 panleukocyte marker, and endothelial cells with the P1H12 
monoclonal antibody that binds to an endothelial-specific cell surface protein. . Essentially all 
the cells separated, as luminal cells from breast cancer samples would be breast cancer cells. 
Thus, as used herein, breast "stromal cells" are breast cells other than epithelial cells. No 
antibody specific for a cell surface marker specific for fibroblasts was identified. Therefore, on 
the assumption that after removal of the above listed cell types the "leftover" cells were enriched 
for fibroblasts, the leftover cells were considered to be a "fibroblast enriched" fraction. The 
success of the purification procedure and the purity of each cell fraction were confirmed by a RT- 
PCR (reverse transcription-polymerase chain reaction) analysis of RNA isolated from 1/10 of the 
cells using the cell type specific marker used for the isolation of the cells. In Fig. 2 is shown the 
results of such an RT-PCR analysis of RNA isolated from: (a) luminal epithelial cells 
("epithelium"), myoepithelial cells ("myoepithelium"), leukocytes, and endothelial cells 
("endothelium") purified as described above from two DCIS tumors (DCIS6 and DCIS7); and 
(b) leukocytes and endothelial cells ("endothelium") from normal breast tissue . The PCR phases 
of the RT-PCRs were carried out with oligonucleotide primers specific for p-actin ("BAC") and 
L19 (both constitutively expressed by all cells), HER2 (expressed by some breast cancers), 
CALLA (a myoepithelial cell marker), CD45 (a pan-leukocyte marker), and an endothelial cell 
surface protein ("CDH5"; an endothelial cell marker). PCR were performed for 25, 30, and 35 
cycles. 

The cells not used for the. RT-PCR analysis were used for the generation of micro-SAGE 
libraries. SAGE libraries were generated from luminal epithelial cells, myoepithelial cells, 
infiltrating lymphocytes, and endothelial cells from a normal breast reduction tissue (1 
library/cell type) and from DCIS luminal and myoepithelial cells, infiltrating lymphocytes and 
endothelial cells (2 different tumors-2 libraries/cell type). Approximately 50,000 SAGE tags 
were obtained from each library, thereby enabling the analysis of thousands of unique 
transcripts. Based on these SAGE data, genes that are differentially expressed in specific cell 
types of normal and DCIS breast tissue were identified. 
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Ligand binding, cell growth, migration and invasion assays 

N-terminal or C-terminal alkaline phosphatase (AP) CXCL14 fusion proteins were 
generated using the AP-TAG-5 expression vector (GenHunter, Nashville, TN). Mammalian 
cells were transfected with Fugene6 (Roche, Indianapolis, IN), Lipofectamine or Lipofectamine 
2000 (LifeTechnologies, Rockville, MD) reagents. In vivo and in vitro ligand binding assays 
were carried out on primary tissues and cell lines using AP-CXCL14 essentially as described 
(Flanagan et al (1990) Cell 63:185-194; Porter et al. ( 2003b) Proc. Natl. Acad. Sci. USA 
100:10931-10936]. Briefly, frozen sections of various human specimens were fixed, incubated 
with either AP-CXCL14 fusion protein or AP control conditioned medium, rinsed, and then 
incubated with AP substrate forming a blue/purple precipitate. For in vitro assays cells in 
suspension with conditioned media containing either AP alone or AP-CXCL14 fusion protein, 
rinsed, and then assayed for bound AP activity. 

To determine the effect of CXCL14 on cell growth, MDA-MB-231 and MCF10A cells 
were plated (4,000 cells/well) in a 24 well tissue culture plate and grown in conditioned medium 
containing AP or AP-CXCL14. Conditioned medium was generated by transfecting 293 cells 
with pAP-tag5 or pAP-CXCL14 plasmids and growing them in McCoy's medium supplemented 
with 10% fetal bovine serum (FBS) (used for MDA-MB-23 1 cells) or in MCF10A media 
(ATCC; used for MCF10A cells). Cells were counted (3 wells/time point) on days 1, 2, 4, 6, and 
8 after plating. 10 nM CXCL12 was used as a positive control in the experiment with MDA- 
MB-23 1 cells. The experiments were repeated three times. 

In order to determine if CXCL14 binding to breast cancer cells has an effect on cell 
migration and invasion, the ability of conditioned medium containing AP-CXCL14 or 
pCDNA3. 1 expressing HA (hemagglutinin)-tagged CXCL14 to induce the migration and 
invasion of MDA-MB-23 1 cells was tested using BIOCOAT Matrigel invasion chambers 
essentially as previously described [Muller (2001) Nature 410:50-56]. For invasion assays, cells 
were plated at a concentration of 2.5x1 0 4 cells/well and assayed 24 hours later. For migration 
assays cells at a concentration of 1 .25x1 0 4 cells/well were used and cell numbers were 
determined 12 hours later. Conditioned media from cells transfected with pAP-Tag5 or pCDNA 
3.1 empty vectors were used as negative controls. 
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Example 2. Normal and Cancerous Breast Transcriptomes Determined by SAGE 
Genes differentially expressed between normal and cancerous breast tissues were 
identified using SAGE. Confirming previous studies of the inventors using a smaller number of 
SAGE libraries [Porter et al. (2001) Cancer Res. 61:5697-5702], the most dramatic difference in 
5 gene expression patterns was found to occur at the normal to in situ carcinoma transition and 
involves the uniform down-regulation of 32 genes (Table 1); while 34 tags and their 
corresponding genes are shown in Table 1, two genes ' (encoding, interleukih-8 and GRO10 were 
each represented by two tags. Table 1 shows data from two normal breast tissue. samples (Nl 
and N2), eight DCIS samples (D1-D7 and T18), six invasive breast cancer samples (11-16), two 

10 lymph node metastases (LN1 and LN2) from the same subjects that samples II and 12 were 
obtained from, and a lung metastasis (MET) from a breast cancer patient. In Table 1 and 
subsequent tables, Unigene identification numbers for relevant, genes are shown in columns 
labeled "Unigene". The contents (e.g.., nucleic acid sequences and amino acid sequences) of 
database submissions identified by all the listed Unigene identification numbers are incorporated 

15 herein by reference in their entirety. Since many of the genes wljose. expression was found to be 
down-regulated after the normal to in situ transition encode secreted proteins and genes related to 
epithelial cell differentiation, loss of the differentiated epithelial phenotype and abnormal 
autocrine/paracrine interactions appear to play an essential role in the initiation of breast 
tumorigenesis. •* 

20 The inventors also identified 144 genes up-regulated in a fraction of in situ, invasive and 

metastatic tumors (Table 2). The normal, DCIS, and lymph node samples studied in this analysis 
were the same as those shown in Table 1. Invasive breast cancer samples 11-15 were the same as 
samples 11-15 shown in Table 1 and T15 was an additional invasive breast cancer sample. 
Nearly 1/4 of the relevant SAGE.tags currently have.no database match indicating that many 

25 transcripts specifically expressed in certain breast carcinomas remain to be identified. 
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Table 2. Genes up-regulated in breast cancer 



Uplgena Gece 



Secreted proteins and ECU relet cd 
ATGTCTTTTC 1516 
CATATCATTA 119206 
CTCCACCCOA 352107 
ACOTTAAAOA 330370 
ATTTTCTAAA 9(011 
AGTGGTGCCT 230 
ATCTTGTTAC 287820 
TTATGTTTAA 79914 
.CTCATCTGCT 82109 
ACATTCCAAG 245188 
CCAGAOAGTG 180384 
TTTGGTTTTC 179573 
ACCAAAAACC 172928 
TGOAAATGAC 172928 
TTTGTTTTTA 3622 
TGGCCCCAGO 258571 
CGACCCCACG 169401 
AACACAGCCT 170250 
GAATTTCCCA 2253 
CAAACTAACC 153261 
GAAATAAAGC 300697 
AAACCCCAAT 181125 



nuulavlike growth factor binding protein 4 
insulin-like growth (actor binding protein 7 
trefoil factor 3 (intestinal) 
dermcldin (BC-l) 
anterior gradient 2 homolog 
Gbrorrodulu 
fihronectin 1 



tissue inhibitor of metalloprotetnase 3 
ca-boxypeptidaso Bl (tissue) 
cciligca type I, alpha 2 
collagen, type I, alpha 1 
collagen, type U alpha 1 

procollagen-proline, 2-oxoglutarato 4-dk>xygenas« 
apoEpopro tern C-I 
apoBpoprotein 6 



t component 2 



frrrruroglobuKn heavy constant mu 
irnmuroglobuKn heavy constant gamma 3 
immunoglobulin lambda joining 3 



Cell surface proteins/receptor* 

AAGCACAAAA 9963 TYRO protein tyrosine kinase binding proton — 

TGGTTTGCGT 6459 putative G-protein coupled receptor GPCR.4 1 

TACAATAAAC 9071 progesterone receptor membrane componir* 2 

AGGAAGGAAC 323910 v-crb-b2 . 

ACATTCTTTT 82226 glycoprotein (transmeinbrane) nmb 

CACOCTGTAC 25450 solute carrier family 29 

GTTCACATTA 84293 CT>74 antigen 

CAAGCAGGAC 179516 integral type I protein 

TGCTGCCTGT 118110 bona narrow stroma] call antigen 2 

CCCATCATCC 306122 glycoprotein, synaptic 2 

GCAGTGGCCT 184276 solute carrier fanifly 9 

Cell cycle and apt? tosh 

AAAGTCTAGA 82932 cyclinOl 

CTGGCGCCGA 183180 APCil anaphase promoting complex subunit 1 1 

Protein synthesis, transport and degradation 

TTTCAOAGAG 75975 signal recognition particle SkDa 



Normal 
Nl W2|Atc 



Dl D2 D3 P4 D3 D6 D7 T18 Ave 



36 
6 
854 
0 

73 
0 
0 

3 
3 

24 
9 

O 

3 

0 

3- 

0 

0 

3 

0 

0 

0 

0 



6 32 
6 63 
17 26 



59 9 

39 a- 

451 31 

0 0 



9 4 
3 42 
38 261 



14 20 
12 2 



129 459 
17 102 



0 
36 
9 
9 
0 
0 
6 
IS 
57 
6 
96 



0 13 12 0 0 

5 36 45 13 23 

0 17 18 1 3 

11 137 43 110 24 

2 7 8 

2 3 8 

23 188 TO 

0 38 6 



1 

0 

6 (3 28 
2 4 64 



14 12 85 57 



21 

22 
274 

0 

18 

9 

4 

6 

9 
10 

4 
61 
70 
70 
2 
16 
18 
£5 
5 
11 
113 
13 



Invasive 



II O 13 14 15 T15 Ave 



13 29 
49 63 
369 124 

177 101 
13 17 
34 36 
2 4 

0 20 
4 5 
7 3 

107 115 

92 90 

92 71 

184 91 

7 *7 

87 58 

29 37 

29 17 

2 7 

172 70 

721 665 

163 87 



159 II 

83 3 

218 23 

27 4 

22 8 

14 3 

31 0- 

1 6 

40 1 

53 43 

78 3 



19 
28 
94 
0 

12 " 

70 

21 

25 

10 

15 

0 

158 



183 189 

254 40 

21 4 

45 92 

54 . 173 

160 84 

I 8 

0 0 

0 2442 

0 241 



20 12 

27 25 

9 5 

60 42 

4 9 

4 1 



3 
72 
6 
tl 
18 
157 



159 208 226 32 428 474 

29 15 12 30 13 

22 41 22 10 21 153 

4 8 17 1 15 4 

43 32 6 7 19 12 



8 
55 
285 
199 

2' 
22 

2 
16 
10 
6 
0 

138 
153 
252 
3 

81 
31 
4 

6 
320 
1445 
258- 



29 
12 
244 
0 
54 
6 
I 

6 

1 

9 
354 



70 48 

34 37 

87 39 
18 0 

28 32 

28 32 
46 7 
7 

13 V 

109 770 1 

10 38 



13 
28 
177 
66 
19 
18 
1 

[■ " 
7 
7 
119 
85 

« 
126 

7" 
47 
31 
19 

5 

176 
775. 
102 



14 

24 

20 
104 

10 

2 
203 

14 

6 

2 
31 



7 

37 
16 
12 
6 

9. 
72 
28 



78 41 | 
6 7 



15 
25 
IS 
40 
14 
5 

115 
19 

42 
5 

Z5 



63. 6 
42 2 



42 39 29 
7 29 2 



56 - 114 36 3 53 
22 17 19 II 15 28 



140 2 
28 20 I 



TTCTTGCTTA 169895 ubiquum-conjugating enzyme E2L 6 
GAGAGTGGGG 252259 ribosomal protein S3 



13 9 
0 
0 



23 92 
3 7 
0 0 



34 25 
7 II 
0 14 



Transcription, chromatin, 
TOAOCAAOCC 27801 
CCTGTACCCC 32317 
CCTTTCACAC 278589 
CACCAGCATT 75847 
TTTTGTAATT 75890 
GTGCAGGGAG 79414 
ATGACTCAAG 239752 
ATTGTTTATO 181153 
AAOOATGCCA 169946 
CTTGTAATCC 183253 
TAGTTTGTGG 78934 



other nuclear proteins 
zinc finger protein 278 "" — 
high-mobility group 20B 
general transcriptlcn factor II, i 
CJUBBP/BP300 inhibitory protein I ' 
membrane-bound transcription factor protease 
prostate epithelium-specific Eta transcription factor 
nuefcar receptor subfamily 2 
high-mobility group niiclcosomal binding domah 2 
OATA binding protein 3 
nucleolar RNA-asiociatcdprotcin 
imtS homolog 2 - 



51 71 83 48 
9 12 14 6 
18 4 0 0 



89 24 
6 26 
0 12 



41 


53 


60 


41 


51 


14 


4 


25 


5 


11 


6 


10 


25 


0 


12 



0 0 
0 0 



0 
3 
15 
1$ 
3 
21 
9 
18 
9 
72 
0 



I 2 

■ 3 9 

22 59 

22 18 

4 . 0 

57 33 

19 39 

55 55 

1 14 



18 II 

7 7 

27 24 

27 15 



47 37 

0 -9 



27 21 24 29 23 

33 60 43 47 

7 17 0 26 

14 19 7 

12 15 4 



21 



13 12 



16 
7 
35 
21 
15 
41 
48 
34 
38 
62 
10 



Signal transd uction 

CGGTCTTATG 75842 . . duaVspecifldty phosphorylation regulated kinase IA 
TGAAAAGCTT 2384 ' tumor protein D52 
TTAAGAGGGA 178137 transducer of ERBB2, 1 
TATTTCACCG 138860 Rho GTPase activating protein 1 
GTCTTTCTTO 151536 RAB13, member RAS oncogene family 
CCAGGGOAOA 278613 interferon, eJpha-induciblo protein 27 
GAGCAGCGCC 1 12408 • SI 00 calcium binding protein A7 (pscriash 1) 
GCTCTGCTTG 112408 StOO calcium bmdirig protein A7 (pjoriasin I) 
CGCCGACGAT 265827 foterfercn, alpha-'inducible protein (IH-6-16) 
GTGTGTTTOT 118787 trimforming growth factor, bcta-inciuced, 63kD 
CCAATAAAGT 101850 rctfsol binding protein I, cellular 
GTCTAGAATC 92384 vitamin A responsive;' cyttwkeletoo related 
AT CCGC OAGG 180 142 caWuItn-Eke sldn protein 
GATTTTGCAC 274479 nucleoside diphosphate kinase 7 

*The above sequences are SEQ ID NOs.»35-97, 



ii 

3 
20 
15 

9 
34 
26 
31 
15 
24 
IP 



19 



13 



ZT 

3 
6 
0 
36 
1018- 3 
76 O* 
17 644 
0 
3 
6 
0 
6 



26- 47 
13 16 



373 16 
20 0 



5 IS 
0 1 
5 1 

0 6 

3 176 

1* 2 



0 0 

90 418 18 366 

10 6 10 

0 2 6 II 

25 6 1 4 

22 0 20 0 

7 0 6 1 



respectively 



ii 

49 44 
18 19 23 47 



18 21 
22 69 



27 22 



11 
0 

6 
0 

130 171 
13 II 
49 2B 
16 7 
47 25 
1 



12 8 
19 32 37 
21 5 | 
0 
0 
5 
21 
6 
21 
0 
4 



3 2 
109 25 I 



12 

9 

9 

31 

0 ' 

0 



2 
U 
13 
77 
0 
0 

'526 181 1 240 
10 9 14 
32 21 52 
10 5 12 
0 .0 I 7 
18 2 | 7 
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Table 2. continued 



Tag 

Metabolism 



Uafgeue Gene 



ACCTTGTGCC 
TGCCOTTTTO 


878 
2006 


CCGTCCTCAT 


9857 


GTTTCTATCA 


12540 


CAAATAAAAT 


71465 


OGAACTTTTA 


43857 


TTACCTTTTT 


79222 


TTGCGGAAAC 


81029 


TGATCTCCAA 


83 ISO 


TTTGOTGTTT 


83190 


TTAACCCCTC 


78224 


OCTTTCATOA 


89649 


TACAGTATOT 


170171' 


TCKKKJTTCTT 


272499 


TTACTTCCCC 


184641 


AAOAATCTOA 


183435 


GTCCCTGCCT 


27S837 


AATATGTGGG 


351375 


GGAGCTCTGT 


227750 


OAAGOAOATA 


171889 


TCAOACTTTT 


334305 


TCTTGTAACT 


256549 


ESTi 








CTGCAACCTA 


374393 


TOAGTGGTTT 


29672 


CACTGTGTTG 


350475 


TTAAGAAGTT 


275360 


OCOACAOTAA 


170853 


TCAACTTGAA 


99244 


TTTCTGGAGG 


129943 


GGGGCTGGAG 


301685 


GTCTCATTTC 


90419 


ACCGCCTGTG 


79625 . 


. GAAGAACAGA 


29341 


TCOTAACGAO 


11197 


GTGATGGGGC 


62620 


CAGAOAAAAT 


IBH44 


GCCCACATCC 


84753 


GTATTTAACT. 


209065 


GGCTGGTCTC 


324844 


AACACTTCTC 


333526 


AATAAAOAOA 


28149 


OAGAAACATT 


267245 


TTTGGTCTTT 


109773 


TOTOOTGGTO 


83422 


GAAAGATGCT 


334370 


TAGCAGACCC 


349196 



Norma! - 


I lasltn 


Invadv* 


MUsitatls 




Nl 


Nl 


» Ave 


Dl 


D2 


D3 


D4 


D5 


D6 


D7 


Til 


I Avi 


U 


a 


13 


14 


15 


tis 


Av« 


LN1 


LN2 ME 


1 Ave 


















0 


2 


1 


4 


IS 


o 


20 


4 


I 


3 




7 


- 




1 


6 


110 


4 


28 


4 


95 


0 


33 


0 


2 


1 


Q 


48 


o 


I 


20 


7 


23 


2 


13 






3 


4 


19 


8 


9 


4 


13 


7 


• 8 


U 


7 


9 


2 


51 


3 


• 20 


18 


4 


5 


67 


22 






21 


7 


12 


56. 


42 


77 


34 


7 


39 


o 


2 


I 


g 


15 


o 


25 


49 


1 


7 


g 


13 




12 


26 


45 


19 


8 




12 


38 


2 


17 


2 


2 


2 


o 


24 


2 


19 


53 


4 


o 


5 


14 




8 


3 


40 


13 


12 


jj 


4 


6 


39 


16 


Q 


2 


1 


17 


36 




f 


5 




14 


25 


14 


9 


8 


26 


0 


60 


0 


17 


ID 


10 


3 


8 


o 


o 


0 










14 


q 




2 


4 


2 


4 


8 


18 


6 


16 




18 


3 


5 


9 


4 


5 


4 


4 


24 


o 


22 


27 


1 




7 


12 


43 


19 


8 


3 


18 


32 


tn 


22 


29 


11 


11 


16 


5 


10 


53 


63 






182 


3 1 






74 


168 


33 


105 


17 


314 


4 




254 


46 


21 


107 




o 


3 


g 


24 




57 


Jo 




28 


21 


21 


36 


41 


62 


14 


57 


12 




28 


(0 


4 


14 


2 


o 


1 


25 


° 










1 


3 




31 


57 


13 


6 


• 0 


32 


13 


18 


46 


9 


24 


o 


2 


1 






2 




20 




9 


12 


13 


16 


29 


13 


6 


29 


40 


Z2 


29 


6 


14 


17 


o 


" 5 


2 








3 




■ 4 


24 


228 


50 


4 


19 


87 


26 


?6 


56 


11 


4 


(6 


0 


7 


2 


2 


2 










Mi 
113 


° 


84 


0 


25- 


7 


13 


10 


0 


0 


0 


5 


0 


32 


0 


11 


2 


o 


I - 






Q 


138 


29 


9 


2 


0 


22 


29 


19 


10 


32 


43 


4 


23 


53 


4 


4 


20 


o 


o 


a 






3 






1 


3 


0 


10 


34 


20 


14 


17 


35 


0 


20 


71 


46 


2 


39 


o 


5 


2 




18 




10 


53 


1 


6 


S 


fj 


4 


13 


22 


8 


47 


0 


16 - 


4 


12 


11 


9 


1 1 


s 


8 


•38 


707 




19 


219 


2 


1 12 


23 


141 


325 


337 


77 


30 


185 


24 


163 


28 


1250 


14 


431 


4 


5 


4 




39 


5 


Yj 


27 




21 


14 


17 


18 


11 


30 


22 


29 


16 


- 21 


16 


31 


9 


19 


o 


o 


0 


* 




° 




o° 


° 


1 


0 


2 


9 


15 


14 


34 


4 


4 


13 


2 


23 


2 


9 


o 


o 


0 








15 






0 


28 


7 


2 


22 


1 


17 


0 


4 


8 


2 


0 


30 


11 


o 


o 


0 


q 












4 


2 




11 


13 


4 


1 


4 


48 


14 


22 


12 


2 


12 


0 


0 


o 


2 


O 


0 


1 


6 


0 


3 


0 


1 


2 


0 


6 


6 


7 


0 


4 


2 


0 


0 


1 


2 


0 


J 


11 


6 


2 


13 


9 


4 


.8 


9 


7 


2 


7 


8 


4 


7 


12 


7 


12 


16 


16 


IS 


0 


0 


0 ■ 


4 


0 


0 


3 


14 


0 


0 


2 




4 


3 


10 


12 


6 


i 


7 


2 


6 


5 


4 


4 


0 


2 


2 


3 


0 


4 


2 


1 


3 


18 


4 


9 


7 


12 


12 


7 


12 


10 


6 


21 


5 


II 


7 


0 


4 


15 


0 


3 


63 


0 


0 


0 


2 


i 10 


.2 


1 


35 


0 


18 


0 


13 


14 


6 


0 


7 


0 


0 


0 


4 


0 


0 


6 


16 


0 


5 


16 


' 6 


9 


B 


9 


3 


15 


20 


11 


2 


1 


4 




0 


0 


0 


21 


3 


3 


7 


4 


13 


0 


0 


6 


16 


19 


9 


3 


10 


0 


9 


28 


40 


16 


28 


2 


0 




IS 


3 


3 


4 


12 


6 




2 


6 


16 


12 


12 


6 


7 


4 


9 


20 


6 


13 


13 


a 


0 


0 


11 


6 


5 


13 


29 


6 


6 


4 


10 


2 


9 


14 


6 


7 


16 


9 


8 


13 


18 


13 


4 


0 


2 


8 


3 


2 


4 


23 


1 


33 


0 


9 


0 


13 


14 


3 


21 


0 


8 


0 


29 


0 


10 


2 


5 


3 


4- 


36 


2 




80 


4 


121 


19 


33 


4- 


7 


13 


19 


21 


12 


13 


6 


6 


9 


7 


0 


0 


0 


13 


3 


3 


4 


16 


0 


2 


2 


5 


4 


9 


14 


8 


6 


0 . 


7 


6 


15 


7 


9 


4 


2 


3 


11 


0 


0 


15 


8 


4 


3 


23 


8 


25 


8 


18 


19 


4 


12 


14 


22 


10 


16 


16 


a 


0 


1 


2 


12 


0 


13 


2 


0 


4 


11 


5 


.is 


3 


6 


5 


13 


0 


7 


20 


10 


.9 


13 


0 * 


2 


1 


40 


•9 


0 


jo' 


6 


7 


7 


21 


13 


4 


8 


9 


11 


18 


0 


8 


6 


10 


27 


14 


4 


0 


2 


0 


0 


3 


4 


0 


4 


1 


25 


5 


63 


26 


1 


12 


6 


48 


26 


49 




II 


20 


0 


0 


0 


17 


6 


3 


28 


12 


6 


8 


9 


11 


9 


16 


15 


6 


16 


0 


10 


20 


10 


18 


16 


2 


2 


2 


6 


6 


3 


6 


12 


2 


3 


U 


6 


18 


7 


10 


18 


12 


16 


13 


6 


18 


20 


14 


4 


0 


2 


2 


6 


0 


25 


8 


1 


2 


4 


6 


27 


19 


4 


0 


9 


4 


10 


18 


6 


4 


9 


0 . 


2 


1 


0 


3 


0 


6 


23 


0 


1 


60 


18 


7 


4 


21 


0 


31 


0 


10 


6 


6 


2 


3 


0 


2 


1 


17 


"0 


0 


4 


8 


1 


2 


2 


4 


7 


5 


14 


12 


13 


4 


9 


14 


12 


5 


10 


0 


0 


0 


3 


0 


3 


6 


10 


4 


4 


4 


5 


20 


28 


12 


15 


15 


24 


19 


10 


10 


0 


7 


5 


2 


4 


6 


3 


2 


55 


39 


7 


7 


4 


13 


87 


25 


18 


22 


13 


36 


34 


92 


18 


5 


38 


2 


0 




6 


48 


0 


1 


0 


1 


1 


0 


7 


29 


37 


I 


I 


1 


0 


12 


0 


162 


2 


54 


0 


0 


• 


0 


3 


3 


1. 


4 


2 


7 1 


12 


4 


13 


13 " 


12 


7 


4 


20 


12 


18 


1 


0 


6 



sorbitol dehydrogenase 
glutathione S-transferase M3 (brain) 
dicarbooyl/L-xylalosc reductase 



sqaaleae epoxtdase 

smribx to gJucosamfne-6-sulTus3C3 

gahctcsidasc, beta I 

biliverdui redutetew A 

fatty add synthase 

fatty acid synthase 

riboruclcase, RNase A family, I (pancreatic) 
cpoidds hydrolase 1, microsomal (xenobbtic) 
ghnaiBta-anmosb llgase * 
dehydrogenase/reductase (SDR family) member 2 
fatty acid desaturase 2 
NADH dehydrogenase 
glutathione S-trensfsraso M2 
cytochrome e oxidase submit Vic 
NADH dehydrogenase I beta subcomplex, 4 
cholbc phosphotransferase I 
dt&cjtgtyceral O-ecyfliansfcrue homobg 2 
nucleotide binding protein 2 



BSTl 
ESTl 
ESTi 

ESTcbne IMAGE: 44305 1 4 
EST* 
ESTi 
ESTi 

KIAA0S45 protein 
KIAA0620 protein 
ECIAA08B2 protein 

chromosome 20 open reading frame 1 49 
chromosome 20 open reading frame 8 1 
chromosome 20 open reading frame 92 
chromosome 6 open reading frame I 
hypothetical protein LOC5I235 
hypothetical protein FIJI 244 2 
hypothetical protein FU 14225 
hypothetical protein IMAGE34 55200 
hypothetical protein MGC 14832 
hypothetical protein BO0 10626 
hypothetical protein FIJI 4803 
hypothetical protein FU20625 
MLN51 protein 
brain expressed. X-linked 1 
trtyebfd/lymf hoid or rraxed-Iineage leukemia 



*The above sequences are SEQ ID NOs: 98-144, respectively 
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Table 2. continued 





Normal 


In situ 


Invasive 


Metastatic 


Tag Uotge'ne Gene 


Nl 


N2 


Ave 


Dl 


02 


D3 


D4 


D5 


D6 


D7 


TM 


Ave 


11 


12 


D 


14 


IS 


TI3 


Ave 


LN1 


LN2 MET 


Are 




















AACGCTGCGA NA 


No reliable match 


7 


5 


6 


36 


24 


0 


4 


35 


1 


10 


0 


14 


31 


60 


23 


I 


19 




22 


29 


10) 
1 


23 


51 


AATGOATOAA NA 


_ No reliable match _ 


0 


o 


0 


38 


0 


0 


3 


2 


1- 


0 


44 


II 


2 


o 


o 


0 


o 


60 


10 


4 


0 


2 


ACATCOTAGT NA 


No reliable match 


o 


o 


0 


o 


15 


Q 


3 


31 


o 


2 


2 


'7 


13 


20 


4 






4 


9 


0 


60 


0 


20 


ACOCOCCOOO NA 


No reliable match 


[ 1 


7 


9 


103 


18 


3 


4 


o 




5 


166 


38 


20 


g 






4° 


193 


38 


31 


23 


0 


18 


AOTOCAOOGA NA 


No reliable match 


o 


0 


0 


2* 


o 




15 


2 


o 


0 


37 


7 


38 


Q 


23 


1 


1 


48 


20 


26 


0 


7 


j 1 


ATCAAGAATC NA 


No reliable match 




0 


I 


2 


3 


3 


Q 


g 


o 


3 


9 


5 


18 


13 


15 




16 


il 


■23 


22 


13 


13 


16 


ATGTGGCACA NA 


No reliable match 


4 


2 


3 


2 


24 


o 


20 


31 


I 


9 


34 


IS 


18 


16 


12 


44 


23 


8 


20 


14 


13 


9 


12 


CAAACCTTTA NA 


No reliable match 


o 


Q 


0 


] 1 




o 


16 


25 


1 




o 


8 


16 


16 






13 


3 


15 


33 


15 


34 


27 


CAATOCTOCC NA 


No reliable match 


1 1 


12 


a 


33 


12 * 


3 


23 


33 


9 


3 


64 


2S 




145 


j? 
18 


18 


26 


44 


139 


588 


28 


11 


209 


CAGCTTAATT NA 


No rcltabls match 






3 








25 










7 


3T 


20 


0 


0 


4 


4 


j| 


90 


6 


5 


34 


CCOACOOOCO NA 


No reliable match . 


4 


2 


3 


67 


3 


0 


3 


0 


1 


4 


87 


21 


7 


0 


0 


0 


0 


181 


31 


4 


7 


0 


4 


CCTTTGAACA NA 


No reliable match . 


2 


0 


1 


4 


6 . 


5 


0 


10 


2 


3' 


14 




9 


13 


3 


12 


6 


16 




2 


4 


4 


3 


CCTTTGCCCT NA 


No reliable match 


0 


0 


0 


0 


9 


2 


73 


16 


1 


14 


5 


15 


27 


26 


19 


0 


9 


0 


14 


28 


9 


0 


12 


COOTTTAATT NA 


No reliable match 


2 


0 


1 


23 


0 


0 


12 


10 


1 


3 


53 


19 


13 


9 


26 


3 


25 


16 


15 


20 


O 


0 


7 


CTTTATTCCA NA 


No reliable match 


0 


0 


0 


19 


0 


2 


48 


2 


O 


0 


5 


9 


25 


22 


31 


4 


(6 


0 


16 


18 


15 


3 


13 


GAAGTCGGAA NA 


No reliable match 


4 


0 


2 


48 


0 


2 


3 


2 


27 


3 


2 


a 


20 


3 


4 


12 


4 


0 


7 


18 


9 


7 


a 


GATCTCGCAA NA 


No reliable match 


4 


7 


5 


44 


21 


0 


31 


25 


7 


1 


0 


id 


40 


13 


12 


22 


16 


4 


18 


47 


38 


64 


50 


GCAOCTCCTA NA 


No reliable match - 


2 


0 


I 


8 


9 


2 


7 


12 


4 • 


1 


2 


6 


13 


12 


6 


11 


10 


0 


9 


1? 


6 


7 


. a 


OCCGTOAGCA NA 


No reliable match 


2 


0 


I 


17 


12 


0 


6 


8 


2 


I 


5 


6 


25 


17 


1 


6 


13 


0 


10 


12 


31 


20 


21 


OGAAAGTGAC NA 


No reliable match 


0 


0 


0 


2 


6 


2 


4 


10 


0 


5 


7 


5 


11 


22 


12 


6 


26 


0 


13 


12 


23 


9 


IS 


GGACCTTTAT NA 


No reliable match | 


2 


0 


1 


23 


3 


0 


1 


23 




0 


37 


a 


2 


1 


1 


0 


1 


0 


1 


4 


3. 


0 


2 


GOCAGACAAT NA 


No reliable match 


0 


0 


0 


13 


0 


0 


12 


14 


1 


2 


7 


6 


16 


5 


1 


15 


7 


0 


-7 


18 


12 


13 


14 


GGCAGCACAA NA 


No reliable match 


0 


5 


2 


25 


18 


0 


IS 


27 


20 


12 


5 


IS 


49 


11 


5 


12 


6 


4 


IS 


35 


25 


29 


30 


GGTAOCTGCT NA 


No reliable match 


0 


0 


0 


• 6 


3 


0 


3 


20 


0 


6 


14 


7 


7 


4 


4 


4 


3 


0 


4 


2 




4 


2 


GGTAOTTTTA NA 


No reliable, match 


13 


0 


6 


59 


21 


3 


32 


41 


2 


13 


18 


24 


18 


28 


39 


0 


59 


16 


26 


-IS 


79 


0 


32 


OGTCAOTCGG NA 


No reliable match 


5 


5 


5 


76 


15 


2 


0 


0 


39 


3 


102 


30 


25' 


3 


t 


7 


1 


80 


20 


18 


13 


2 


a 


GTAATCCTCC NA 


No reliable match 


4 


2 


i 


34 


6 


12 


0 


4 


187 


28 


51 


40 


22 


17 


6 


25 


I 


52 


21 


24 


7 


7 


13 


GTAGTTACTG NA 


No reliable match 


2 


2 


2 


8 


120 


0 


1 


25 


0. 


21 


4 


22 


33 


33 ' 


13 


7 


19 


0 


18 


8 


172 


4 


61 


TCACAGTGCC NA 


No reliable match 


2 


2 


2 


15 


3 


2 


13 


39 


I 


7 


14 


12 


29 


5 


42 


28 


21 


6 


22 


20 


6 


13 


13 


TCTGGTTTGT NA 


No reliable match 


2 


2 


2 


6 


12 


3 


10 


33 


5 


2 


7 


10 


29 


10 


4 


50 


3 


12 


19 


41 


6 


7 


18 


TOAAOCAGTA NA 


No reliable match 


4 


2 


3 


99 


3 


2 


36 


27 


9 


5 


25 


26 


74 


46 


122 


57 


85 


12 


66 


57' 


40 


25 


41 


TGTCATAOTT NA 


No reliable match 


0 


0 


0 


0 


15 


0 


9 


55 


0 


3 


9 


a 


34 


42 


9 


4 


34 


4 


21 


6 


197 


0 


68 


TTACQATGAA NA 


No reliable match . 


2 


0 


1 


0 


6 


0 


3 


18 


1 


1 


0 


4 


51 


41 


4 


1 


7 


O 


18 


73 
55 


9 


2 


28 


TTCOGTTGGT NA 


No reliable match 


2 


0 


I 


101 


3 


0 


55 


16 


0 


0 


7 


23 


58 


40 


40 


1 


60 


4 


34 


22 


11 


29 



Ave^average number of SAGE tags/histologic stage. 



*The above sequences are SEQ ED NOs:145-178, respectively 
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To identify overall similarities and differences among samples, the 19 SAGE libraries 
were analyzed by hierarchical clustering (Fig. 3 A). A dendogram created using this program 
revealed that, while the two normal samples (Nl and N2) were more similar to each other than to 
any other samples, the primary invasive tumor and lymph node metastasis from the first patient 
(II and LN1) were more similar to each other than to any other sample and the primary invasive 
tumor and lymph node metastasis from the second patient (12 and LN2) were more similar to 
each than to any other sample. In situ tumors, invasive tumors, and metastases did not form 
distinct clusters suggesting that none of these tumor classes is there a pronounced and common 
"in situ", "invasive", or "metastasis" signature. Correlating with this observation, clustering and 
other statistical analyses failed to identify any gene that was universally and specifically up or 
down-regulated in DCIS, invasive, or metastatic tumors (Fig. 3 A). These findings confirm 
previous studies performed in invasive breast carcinomas and highlight the fact that DCIS 
tumors are just as heterogeneous at the molecular level as their invasive counterparts [Perou et al. 
(2000) Nature 406:747-752], 

To analyze the relationships among DCIS tumors in more, detail, hierarchical clustering 
was performed using the eight DCIS libraries (Fig. 3B). The expression profiles of 582 genes 
(Table 3) were included in this analysis; while 920 SAGE tags and their corresponding genes are 
listed, in Table 3, many of the genes are represented by more than one tag. The program used for 
the clustering analysis (see Example 1) filtered for tags at least ten copies of which were present 
in at least one library and which were present in at least one library in a number at least ten- fold 
higher than in a library from another category of breast tissue. Genes expressed by non- 
epithelial cells apparently play a predominant role in defining the relatedness of samples since 
the BerEP4 purified (D2, D3, D6, and D7) and unpurified (Dl, D4, D5, and T18) tumors formed 
two distinct clusters. Tumors also appeared to cluster according to their histologic grade with the 
high-grade tumors (D3, D6, D4, and D5) and the intermediate grade tumors (D2, D7) DCIS 
showing highest. similarity to each other. However, T18, an intermediate grade, non-comedo 
DCIS, showed highest similarity to Dl , a high grade comedo DCIS, suggesting that, despite its 
histologic features, this DCIS appears to have the molecular profile of a high grade, comedo 
DCIS. 



.52. 



WO 2004/085621 



PCT/US2004/008866 



Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 


Tag 


Unigene 


Gene name 


179 


AGCGACAAAC 


82109 


syndecan 1 


180 


AGGAAGGAAC 


323910 


v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homoiog 
(avian) 


1S1 


CTGTTCCGGC 


286192 


dopamine and cAMP-regulated neuronal phosphoprotein 32 


182 


ATCGCTTTCT 


177486 


amyloid beta (A4) precursor protein (protease nexin-II, Alzheimer disease) 


183 


GTGGCCACGG 


112405 


S 1 00 calcium binding protein A9 (calgranulin B) 


184 


ATGTGAAGAG 


111779 


secreted protein, acidic, cysteine-rich (osteonectin) 


185 


ATGTGAAGAG 


126515 


EST 


186 


TGAAGCAGTA 


.176626 


tiemogen 


187 


TGAAGC AGTA . 


326248 


programmed eel! death 4 (neoplastic transformation inhibitor) 


188 


ACCAAAAACC . 


172928 


collagen, type I, alpha 1 ( \ 


189 


rrrucACcrr 


75511 


connective tissue growth factor 


190 


riTUGnrrc 


21431 


suppressor of fused homolog (Drosophila) 


191 


rnoarrnc 


179573 


retinoblastoma binding protein 1 


192 


TGGAAATGAC 


172928 


collagen, type I, alpha I 


193 


TGGAAATGAC 


173648 


ESTs, Weakly similar to zinc finger protein ZNF287 [Homo sapiens] [Rsapiens] 


194 


GGGCATCTCT 


76807 


major histocompatibility complex, class D, DR alpha 


195 


TTGCTGACTT 


108885 


collagen, type VI, alpha I 


196 


TTGCTGACTT 


238928 


HT002 protein; hypertension-related calcium-regulated gene 


197 


TTTCAGAGAG 


75975 


signal recognition particle 9kD 


198 


TTTCAGAGAG 


355743 


ESTs, Highly similar to SR09 HUMAN Signal recognition particle 9 kDa protein (SRP9) [Rsapiens] 


199 


AACTGCTTCA 


U538 


actin related protein 2/3 complex, subunit IB (41 kD) 


200 


ACTTACCTGC 


12504 


likely ortholog of mouse Arkadia 


201 


ACTTACCTGC 


174031 


cytochrome c oxidase subunit VIb 


202 


TGTGGTGGTG 


83422 


MLN51 protein 


203 


TGTGGTGGTG 


223618 


EST 


204 


TTACTTCCGC 


184641 


fatty acid desaturase 2 


205 


CATTTCAATA 


75431 


fibrinogen, gamma polypeptide 


206 


CATTTCAATA 


32587 


steroid receptor RNA activator 1 


207 


GTGCTGATTC 


75584 


polymyositis/scleroderma autoantigen 2 (lOOkD) ' 


208 


GTGCTGATTC 


1640 


collagen, type VII, alpha 1 (epidermolysis bullosa, dystrophic, dominant and recessive) 


209 


CGACCCCACG 


169401 


apolipoprotein E H 


210 


TTTTGTAACT 


256549 


nucleotide binding protein 2 (MinD homolog, E. coli) 


211 


TCTAAGTACG 






212 


CTTCCTTGCC 


2785 


keratin 17 


213 


CTTCCTTGCC 


272572 


hemoglobin, alpha 1 


214 


TTAAGAAGTT 


275360 


ESTs 


215 


GCTCTGCTTG 


112408 


S 100 calcium .binding protein A7 (psoriasin 1) 


216 


ATTAAGAGGG . 






217 


GAGCAGCGCC 


112408 


S 100 calcium binding protein A7 (psoriasin I) 


218 


CCTGGGAAGT 


12035 


ESTs, Weakly similar to 2004399A chromosomal protein [Homo sapiens] [Rsapiens] 


219 


CCTGGGAAGT 


89603 


mucin 1, transmembrane 


220 


CAAACTAACC 


75813 


polycystic kidney disease 1 (autosomal dominant) 


221 


CAAACTAACC 


153261 


immunoglobulin heavy constant mu . 


222 


AAACCCCAAT 


8997 


Sadl unc-84 domain protein 1 


223 


AAACCCCAAT 


77735 


hypothetical protein FU 11618 


224 


GAAATAAAGC 


300697 


immunoglobulin heavy constant gamma 3 (G3m marker) 


225 


GAAATAAAGC 


111334 


ferritin, light polypeptide 


226 


AAGGGAGCAC . 


181125 


immunoglobulin lambda locus , 


227 


AAGGGAGCAC 


8997 


Sadl unc-84 domain protein 1 


228 , 


GGAGTGTGCT 


9615 


myosin, light polypeptide 9, regulatory 


229 


CATATCATTA ., 


119206 


insulin-like growth factor binding protein 7 


230 


TTTTTAATGT 


181307 


H3 histone, family 3 A 


231 


rrrn'AATGT 


• 356202 


ESTs, Highly similar to S06250 histone H3 [similarity] 


232 


CTCCCCCAAG 







53 



WO 2004/085621 



PCTYUS2004/008866 



Table 3. Genes employed for the clustering analysis shown in Fig. 3B 




235 
236 




255 
256 
257 
258 
259 



268 
269 



270 
271 . 



GTTCACATTA 



GTACGTATTC 



GTGATGGTGT 

GTGATGGTGT 

TGAGGGAATA.. 

GGCACAGTAA 

GGCACAGTAA 



CGGTTTAATT 
TTTCTAGTTT 



CTGGAGGCTG 



84298 



ggj m^unoglobulm J polypeptide, lin ker protein for imm unoglobulin alpha and mu polypeptides 



1 97345 
3352 
83848 
11270 
49169 



U1894 



98967 



CD74 antigen (invariant polypeptide of major histocompatibility complex, class g antigen-associate 



thyroid autoantigen 70kD (Ku antigen) 
histone deacetylase 2 ~™ — - 
triosephosphate isomerase 1 
hypothetical protein MGC2491 
KIAA1634 protein 




RAB25, member RAS oncogene family 



lysosomal-associated protein transmembrane 4 alpha 



ATPase, H+ transporting, lysosomal VP subunit a isoform 4 



273 



CCTAGCTGGA 
CCTAGCTGGA 



rhophilin 1 



356332 



ESTs, Moderat ely similar to S71220 peptidylprolyl isomerase (EC 5,2, 1,8) RQCf 



274 



TTACCTCCTT 



jeptidylprolyl isomerase A (cyclophilin A) 



275 



CAATTAAAAG 



276 
277 



36475 



sapiens, clone MGC:8772 IMAGE:3862861, mRNA. complete cdT 



CAATTAAAAG 



149923 



Homo sapiens cDNA FU36837 fis, clone ASTRO201 1422 



CCTTTCACAC 



278589' 



X-box binding protein 1 



general transcription factor II, i 



279 



TCGGTTGGT 



.24809 



Homo sapiens cDNA FU25021 fis, clone CBL0 1 740 



hypothetical protein FLJ10826 



280 
281 



GGTAGTTTJA 



82302 



GTAGACACCT 
f 



Homo sapiens cDNA FLJ32144 fi S> clone PLACE5000105, highly similar to Mus museums mRNA for 
heparan sulfate 6-sulfotransferase 2 . ■ I0r 



ribosomal protein L7 



283 



284 
285 



TAATTTGT 
AAGTTGCTAT 



220689 



golgi phosphoprotein 2 



78575 



Ras-GTPase-activating protein SH3-domain-binding protein 



irosaposin (variant Gaucher disease and variant metachromatic leukodystrophy) 



286 | GG AATGT ACG 



429 



phospholipid scramblase 3 



I ATP synthase, H+ transporting, mitochondrial F0 complex, subunit c (subunit 91 isof^nTT 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 

NO: 



287 



Tag 



CAAGCAGGAC 



Unigene 



179516 



Gene name 



integral type I protein 



289 
290 



291 



292 



CACCACGGTG 



ESTs, Highly similar to HSHU33 histone H3.3 



241471 



RNB6 



TACAGTATGT 



CTGTTGGTGA 



170171 
3463 



glutamate-ammonia ligase (glutamine synthase) 



CTGTTGGTGA 



356628 



ribosomal protein S23 



ESTs, Moderately similar to T483 1 7 hypothetical protein F9G14.270 



294 
295 



296 
297 



298 



299 



300 



TGTATGAATT 



Homo sapiens, clone IMAGE:4617948, mRNA 



28777 



CTCGCGCTGG 



40369 



H2A histone family, member L 



CTCGCGCTGG 



25640 



Homo sapiens cDNA FU33345 fis, clone BRACE2003713 



claudin 3 



GGTGAGACAC 



164280 



GGTGAGACAC 



solute carrier f amily 25 (mitochondrial carrier; adenine nucleotide trans locator), member 6 



GGGGTAAGAA 



80423 



Homo sapiens cDNA FU30227 fis, clone BRACE200 1 865 



GCAGCCATCC 



4437 



TGCTGGTGTG 



prostatic binding protein 
ribosomal protein L28 



298573 



KIAA1720 protein 



302 
303 



KIAA0864 protein 



356767 
29797 



ESTs, Weakly similar to 60S ribosomal protein L10, putative [A rabidopsis thalianal lAthaii 

nkAPAvMAl — 4*!^ W 1 ^^^^^^*^ WM *^ W """^ W " " ' " ' II m r - 



iana] 



305 



306 



GTAGGGGTAA 



ribosomal protein L10 



CTTGAGCAAT 



848 



FK506 binding protein 4 (59kD) 



75725 



thiopurine S-methyltransferase 



308 
309 



lectin, galactoside-binding, soluble, 1 (galectin 1) 



vesicle-associated membrane protein 8 (endobrevin) 



311 



312 



313 



314 
315 



316 



25197 



GGGCCCAGGA 



118983 



CAAGGGCCAA 



170160 



STIP1 homology and U-Box containing protein 1 
hypothetical protein FU12150 ~ 



GCAAAAGAAA 



1265 



RAB2, member RAS oncogene family-like 



GCAAAAGAAA 



155543 



sranched chain keto acid dehydrogenase EI, beta polypeptide (maple syrup urine disease)" 



CTCCACCCGA 



82961 



— .- *-i , j rT-"* jjti up m 

proteasome (prosome, macropain) 26S subunit, non-ATPase, 7 (Mov34 homolog) 



Trefoil factor 3 



AATATGTGGG 



98664 



ESTs, Moderately similar tp COXH HUMAN Cytochrome c oxidase polypeptide VIC 



precursor [Rsapiens] 



318 
319 



320 



321 



GTAGTTACTG 



351875 
269021 



cytochrome c oxidase subunit Vic 



ESTs 



TGGCAACCTT 



279952 



TGGCAACCTT 



75117 



glutathione S-transferase subunit 13 homolog 



TGTCATAGTT 



interleukin enhancer binding factor 2, 45kD 



323- 
324 



279837 



GTCCCTGCCT 



301961 



glutathione S-transferase M2 (muscle) 



ATTGTTTATG 



glutathione S-transferase Ml 



18U63 



high-mobility group (nonhistone chromosomal) protein 17 



33317 



KIAA1393 protein 



326 
327 



GCCTGCTGGG 



2706 



glutathione peroxidase 4 (phospholipid hydroperoxidase) 



118110 



bone marrow stromal cell antigen 2 



145477 



HCGIV-6 protein 
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Table 3 


. Genes employ 


ed for the 


i clustering analysis shown in Fig. 3B 




Tag 


Unigene 


Gene name 


347 

J*TWi 


A (~wTf^ i'lYVT 
AVJ I U^L»VJ lul 


/oJyl 


myxovirus (influenza virus) resistance 1, interferon-inducible protein p78 (mouse) 


J**J 


A I vJVJ^ 1 VJVJ 1 A 






344 




JJHOOO 


hypothetical protein FU23209 






1 IA1 /* A 

1 IV 140 


eukaryotic translation initiation factor 5A 


74/; 

J*rO 


i 1 VJxJ i vJAAGG 


* .75968 


thymosin, beta 4, X chromosome " ' 


347 


ttywto a a nn 


J 5 6629 


Homo sapiens cDNA FU3 1414 fis, clone NT2NE2000260, weakly similar to THYMOSIN BETA-4 


74R 
j**o 


TAPiPTrTATP. 
1 AuL 1 1 A 1 Lr 


76549 


ATPase, Na+/K+ transporting, alpha 1 polypeptide - 1 


740 


A ATA A Art A /I A 
AA I AAAG AG A 


28149 


hypothetical protein BC0 10626 . 


jjU 


a At* a a a n a n a 
AA 1 AAAUAuA 


337535 


ESTs • 


151 


PAAATAAAAA 

"uAAAl AAAAA , 


1116 


iymphotoxm beta receptor (TNFR superfamily, member 3) 


1«C5 
JDZ 


f^A A ATA A A A A 

i-AAAl AAAAA 


21 198 


translocate of outer mitochondrial membrane 70 homolog A (yeast) 


353 


TAGCATCAAT 


79877 


myotubularin related protein 6 " "™" 


JD4 


T A C*f* A TP A A *!• 

1 AGG A I G AAT 


169476 


glyceraldehyde-3-phosphate dehydrogenase 


JjJ 


t a a riT a a a 
1 AAG 1 Au^AA 


11 191 1 


ESTs, Weakly similar to T0629 1 extensin homolog T9E8.80 


J3o 


rAAGTAGCAA 


239625 


integral membrane protein 2B 


357 


GAAGCAGGAC 


180370 


cofilin 1 (non-muscle) 


ICQ 

J5o . 


1 rAGCAATAA 


74346 


hypothetical protein MGC 14353 


^ en 


I 1AGGAATAA 


75798 


chromosome 20 open reading frame 111 


JOU 


LAAIUIU 11 A 


74823 


NADH dehydrogenase (ubiquinone) I alpha subcomplex, 1 (7.5kD, MWFE) 


iol 


GAATGTGTTA 


181788 


ESTs - 


J02 


GAGGACCCAA 


77313 


cyct in-dependent kinase (CDC2-Iike) 10 


i« 
JOJ 


GGGIGGIGAI 


9857 


dicarbonyl/L-xylulose reductase 


J04 


GGG 1 GG LTGG 


6551 


ATPase, H+ transporting, lysosomal interacting protein I 


365 


GTGCAGGGAG 


79414 


prostate epithelium-specific Ets transcription factor 


Job 


G 1 GG AGGGAG 


180403 


STRIN protein 


30f 


1 I AG 1 AAA! G 


155560 


calnexin 


joo 


TT A /* V I* AAA TV 

1 1AG1AAA1G 


7917 


DKFZP564K247 protein 




/-» A A A T A /""* A /""T> 

G AAA I AG AGT 


67201 


5',3 '-nucleotidase, cytosolic 




GAAATACAGT 


343475 


cathepstn D (lysosomal aspartyl protease) 


1*71 
J /I 


pA A ATA A A AT" 

GAAAl AAAA1 


71465 


squalene epoxidase 


J /<£ 


1 GGA 1 G 1 GGT 


75410 


heat shock 70kD protein 5 (glucose-regulated protein, 78kD) 


J / J 


1 1 1G AGGGGA 






3 /** 


ill GG 1 G 1 1 I 


83190 


fatty acid synthase 


J /D 


1 AGG 1 G I GAT 


. 2962 


SI 00 calcium binding protein P 


J /O 


1AGGIG1GAT 


263455 


ESTs, Weakly similar to hypothetical protein FLJ20489 [Homo sapiens] [H.sapiens] 


311 


GGCCAGCCCT • 


155455 


phosphofructokinase, liver •'" 


7 "Ml 
J AO 


GGGGAGGGCT 


79 


hypothetical protein MGC 1 5429 


j/y 


GGl 1 IGAIGA 


89649 


epoxide hydrojase 1, microsomal (xenobiotic) 


"3 OA 

joU 


GG ill GA1 GA 


279681 


heterogeneous nuclear ribonucleoprotein H3 (2H9) 


. J5l 


AATAAAGGCT 


1815 


myosin, light polypeptide 3, alkali; ventricular, skeletal, slow 


• Jo2 


A A f A A A /"^ 

AATAAAGGCT 


179735 


ras homolog gene family, member C 


Jo J 


GGl 1 1GGGG1 






Iff A 
Jo4 


a pttp a a r^r* 
GAG 1 1 GAAGG 


77667 


lymphocyte antigen 6 complex, locus E 


IDC 
JO J. 


ITCA1 ACAGC 






iff/; 

JoO 




182740 


ribosomal protein Si 1 


387 


CCATTGCACT 


194382 


ataxia telangiectasia mutated (includes complementation groups A, C and D) 


388 


CCATTGCACT 


244378 


solute earner family 2 (facilitated glucose transporter), member 6 


389 


AAATAAAGAA 


14841 


ESTs — — 


390 


AAATAAAGAA . 


355733 


microsomal glutathione S-transferase 1 - , " 


391 


GGGTTGGCTT 


73818 


ubiquinol-cytochrome c reductase hinge protein 




392. 


ACTTTTTCAA 


133430 


•gsrS : ; ■■ r —t— 




393 


ACTTTTTCAA . 


246500 


EST . ■ ; : : — ! : 


394 


CCCATCGTCC 






.395 


GCGGCTTTCC 


■ 278431 


SCO cytochrome oxidase deficient homolog 2 (yeast) 


396 


GGGAAGCAGA 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 

NO: 



Tag 



CTGACCTOTG 



CTGACCTGTG 



Unigene 



77961 



181244 



Gene name 



major histocompatibility complex, class I, B 



major histocompatibility complex, class I, .A 



ATTTTCTAAA 



9101 



nuclear receptor subfamily 4, group A, member 1 
anterior gradient 2 homolog (Xeneptis laevis) 



TGCTAAAAAA 



146550 
313761 



myosin, heavy polypeptide 9, non-muscle 
ESTs 



GGAATAAATT 



GTGTGTAAAA 



291904 



AGAAAAAAAA 



accessory protein BAP3 1 



AGAAAAAAAA 



153834 pumilio homolog 1 (Drosophila) 



TCAAAAAAAA 



10846 



enolase 1, (alpha) 



polyamine N-acetyltransferase 



CTAAAAAAAA 



hypothetical protein MGC13064 



9873 



CTAAAAAAAA 



54457 



likely homolog of rat kinase D-interacting substance of 220 kDa 



CAAAAAAAAA 



126906 



CD81 antigen (target of antiproliferative antibody I) 



CAAAAAAAAA 



234355 



hypothetical protein FU12598 



GACTCACTTT 



hypothetical protein FU22569 



699 



peptidylprolyl isomerase B (cyclophilin B) 



312644 



sulfotransferase family, cytosolic, 1C, member 2 



279929 



GCAAAAAAAA 



gp25L2 protein 



4746 



GCAAAAAAAA 



91579 



hypothetical protein FLT21324 



CACTTGCCCT 



14779 



similar to HYPOTHETICAL 34.0 KDA PROTEIN ZK795.3 IN CHROMOSOME 1V" 



CACTTGCCCT 



15977 



acetyl-Coenzyme A synthetase 2 (ADP forming) 



NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 9 (22kD, B22) 



298275 



AAAAAAAAAA 



78713 



solute carrier family 38, member 2 



AAAAAAAAAA 



10235 



solute carrier family 25 (mitochondrial carrier; phosphate carrier), member 3 



GAAAAAAAAA 



12185 



chromosome 5 open reading frame 4 



irotein phosphatase 1, regulatory (inhibitor) subunit 16A 



99843 



GGGGACTGAA 



438 



DKFZP586N0721 protein 



GGGGACTGAA 
TTGAATTCCC 



mesenchyme homeo box 1 



3709 



171921 



.low molecular massubiquinone-binding protein (9. 5kD) 

sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (sem aphorin) 3C" 



251064 



high-mobility group (nonhistone chromosomal) protein 14 



356285 



TTTCTGTTAA 



12101 



ESTs, Highly similar to HG14 HUMAN Nonhistone chromosomal protein HMG-14 [H.sapiens] 



TGATCTCCAA 



11050 



hypothetical protein LOC51242 



TGATCTCCAA 



L box only protein 9 



83190 



AAAGTCTAGA 



fatty acid synthase 



82932 



cyclin Dl (PRAD1: parathyroid adenomatosis 1) 



75736 



TACATAATTA 



240443 



apolipoprotein D 



multiple endocrine neoplasia I 



transcobalamm I (vitamin B12 binding protein, R binder family) 



177592 



ribosomal protein, large, PI 



299465 



TAAGGAGCTG 



ribosomal protein S26 



355957 



TAAAAAAAAA 



80612 



ESTs, Highly similar to RS26 HUMAN 40S ribosomal protein S26 [H.sapiens] 



ubiquitin-conjugating enzyme E2A (RAD6 homolog) 



ribosomal protein S14 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 



453 
454 



456 



457 



458 



459 



460 



461 



462 



463 



Tag 



AACTAAAAAA 



AACTAAAAAA 



TAGGTTOTCT 



TAGGTTGTCT 



TTAAAAAAAA 



TTAAAAAAAA 



AACTAACAAA 



AACTAACAAA 



CAAGGGCTTG 



AAGGCAATTT 



AAGGCAATTT 



CTCCTCACCT 
CTCCTCACCT 
GACTCTGGTG 



Unigene 



3297 



55921 



279860 



19054 



78825 



3297 



164170 
93213 
119122 
334859 



Gene name 



ribosomal protein S27a 



glutamyl-prolyl-tRNA synthetase 



tumor protein, translationallv-controlled 1 



ESTs, Highly similar to S06590 IgE-dependent histamine-releasing factor 



hypothetical protein PRQ2521 



matrin 3 



ESTs, Moderately similar to UQHUR7 ubiquitin 



ribosomal protein S27a 



RAPIB, member of RAS oncogene family 



Homo sapiens cDNA FUU739 fis, clone HEMB A 1005497 



vascular Rab-GAP/TBC-containing 



BCL2-antagonist/killer 1 
ribosomal protein LI 3a 
histone methyltransferase DOT1L 



467 



GACTCTGGTG 



356189 



468 

.469 



ATTCTCCAGT 



234518 



Homo sapiens, ribosomal protein S.15a, clone MGC:44895 IM AGE:5580542 mRNA, complete cds 



ribosomal protein L23 



470 
471 



TGATAATTCA 



endosulfine alpha 



171625 



hypothetical protein MGC 14697 



472 



GGGCTGGGGT 



473 



GCTTAACCTG 



474. 



GGATTTGGCC 



475 
476 
TFT 
478 



GGATTTGGCC 
TGCACGTTTT 
GCATAATAGG 



350068 



sperm associated antigen 7 



ribosomal protein L29 



82506 



glutamate dehydrogenase t 



343426 
169793 
356482 



KIAA1254 protein 



ESTs 

ribosomal protein L32 

ESTs, Weakly similar to putative 60S ribosomal protein L. 



tliana] [A.thalianal 



479 
480 



GCACAAGAAG 



ribosomal protein L21 



289721 



growth arrest-specific 5 



481 
482 



TCAGATCTTT 



ribosomal protein S14 



108124 



ribosomal protein S4, X-linked 



483 
484 



GACAAAAAAA 



356505 



ribosomal protein- S 15a 



485 
486 



GGAACAAACA 
GGAACAAACA 
CTAACTTCGT 



197345 
286124 



ESTs, Moderately similar to RSI A ARATH 40S ribosomal protein S15A [A.thaliana] 



thyroid autoantigen 70kD (Ku antigen) 
cp 24 antigen (small cell lung carcinoma cluster 4 anti gen) 



14838 likely ortholog of mouse.NPC derived proline rich protein I 



488 
489 



TGGCGTGGCC 



8854 



eukaryotic translation elongation factor 1 delta (guanine nucleotide exchange proteinT 



235768 
89388 



Pvtl oncogene homolog, MYC activator (mouse) 



NK inhibitory receptor precursor 



491 
492 



TGGCGTACGG 



Homo sapiens cDNA FU31372 fis, clone NB9N4200028 1 



493 
494 
495 
496 



GGAGCGTGGG 
ACAGCGGCAA 
ACAGCGGCAA 
TCAAGTTCAC 



286226 
323462 
349499 
351928 



myosin IC 

DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptid e 30 
desmoplakin (DPI, DPI!) 

Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 1977059" 



497 
498 



GGAAGCACGG 



•148495 



ESTs, Weakly similar to T05691 multiubiquitin chain-binding protein MBP1 



CAGTTACAAA 



7910 



— - • . ! — vimui WUlVIIIIg 

proteasome (prosome, macropain) 2 6S subunit, non-ATPase 4 



RING1 and YY1 binding protein 



500 
501 



502 



CAGTTACAAA 
CAGGACAGTT 
GGGGAAATCG 
CAAATCCAAA 



312857 
78305 
76293 



ESTs 

RAB2, member RAS oncogene family 
thymosin, beta 10 



503 



TCAGAAGTTT 



504 
505 



243901 



mitogen-activated protein kinase kinase kinase kinase 3 



AAAGTTCTCA 



284243 



Homo sapiens mRNA; cDNA DKFZp564C1563 (from clon e DKFZpS64r:is^ 

— — * . m I i i / 



AAGGATGCCA 



169946 



transmembrane 4 superfamily member tetraspan NET-6 



AAGGATGCCA 
GAGGGCCQGT 
CAGCAGAAGC 



104823 
36727 
323806 



GATA binding protein 3 



EST _ 
H2A histone family, memb er. J 
small EDRK-rich factor 2 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 



Tag 



Unigene 



Gene name 



343261 



510 
511 



CCTCCAGCTA 



242463 



histocompatibility (minor) 13 



keratin 8 



512_ 



GCCTTCCAAT 



- 76053 



ESTs, Moderately similar to 137982 Keratm8 



513 



GGGAGCCCGG 



183986 1 



DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 5 (RNA helicase, 68K DV 

nAlinuime rn/ianf,.. — I — j. I "> rt * 1 ' ■ ' "•» 



poliovirus receptor-related 2 (herpesvirus entry mediator B) 



synaptogyrin 2 




enolase 1, (alpha) 



— — - i y^n^iM^ 

interferon, alpha-inducible protein 27 
anterior gradient protein 3 ™~"~ 



532 
533 



AAGAAAACCT 



100686 



534 



535 



274319 



AGATTCAAAC 



14368 



hypothetical protein FU1 0509 



TGGGGAGAGG 



SH3 domain binding glutamic acid-rich protein like 



181307 



H3 histone, family 3A 



537 
538 



367720 ESTs, Highly similar to HSHU33 histone H3.3 



79136 



LIV-1 protein, estrogen regulated 



540 



541 



GTGCTGAATG 



77385 
120260 



myosin, light polypeptide 6, alkali, smooth muscle and non-muscle 



AACGCGGCCA 



60300 



immunoglobulin superfamily receptor translocation associated I 



hypothetical protein MGC 17552 



macrophage migration inhibitory factor (glycosylation-inhibiting factor) 



544 
545 



300954 



GGCAACGTGG 
CGCCGCGGTG 



31608 



Huntingtin interacting protein K 



transient receptor potential canon channel, subfamily M, member 4 



4835 



eukaryotic translation initiation factor 3, subunit 8 (1 10kD) 



546 



GTGACCACGG 



299882 



ESTs, Highly similar to N-methyl-D-aspartate receptor 2C subunit precursor [Homo sapiens] [Rsapiens] 



548 
549 



GGTGGCACTC 



77273 



GGTGGCACTC 



77550 



ras homolog gene family, member A 



p53-regulated DDA3 



9265 
3764 



mitochondrial ribosomal protein L24 



552 



TGCCTCTGCG 



guanylate kinase 1 



554 



555 



556 



557 



TCCCTGGCTG 



166160 



prosaposin (variant Gaucher disease and variant metachromatic leukodystrophy) 



GACGACACGA 



153177 



acetyl-Coenzyme A acyltransferase 1 (peroxisomal 3-oxoac yt-Coenzyme A thiolasel 



GACGACACGA 



ribosomal protein S28 



GTGCTGGACC 



374547 
20977 



ESTs, Moderately similar to RS28 ARATH 40S ribosomal protein S28 [A.thaIian"aT 



ganglioside-induced differentiation-associated protein i-iike I 



559 
560 



GCAGGCCAAG 



179774 
69771 



proteasome (prosome, macropain) activator subunit 2 (PA28 beta) 



B-factor, properdin 



RAB30, member RAS oncogene family 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 

NO: 



564 
565 
566 
567 
568 



Tag 



TAGAAAAATA 
TAGAAAAATA 
AAGACAGTGG 
AAGACAGTGG 
TGTGCTAAAT 



Unigene 



79194 
2 79789 
3352 
296290 
250895 



cAMP responsive clement binding protein 1 
glucose phosphate iso merase 
histone deacetylase 2 ~"~ 
ribosomal protein L37a 
ribosomal protein L34 



Gene name 



11387 



KIAA1453 protein 



570 
571 



GGCAAGAAGA 



83321 



572 
573 



GGCAAGAAGA 



neuromedin B 



11161 



ribosomal protein L27 



574 



TTGGTCCTCT 



575 



TTGGTCCTCT 



576 



GTGTGGGGGG 



577 



GTGTGGGGGG 



578 
579 



CGTGGGTGGG 



cytochrome c 



Homo sapiens E1BP1 pseudogene, mRNA sequence 



ribosomal protein L41 



2340 junction plakoglobin 



117484 



ESTs 



202833 



heme oxygenase (decycling) 1 



580 
581 
582 
583 



GCCGTTCTTA 



ribosomal protein L38 



ACCCGCCGGG 
GGCCTGCTGC 
GGCCTGCTGC 



280792 
9634 



584 
585 



GGTTTGGCTT 



73818 



hypothetical protein FU12387 similarto kinesin light chain 
hypothetical protein BC009925 — ~ ^ 



ubiquinol-cytochrome c reductase hinge protein 



121397 
15318 



ESTs 

HS1 binding protein 



587 
588 



CTAACTAGTT 



589 
590 



AAGGTGGAGG 



76171 



CCAAT/enhancer binding protein (C/EBP), alpha 



163593 



591 
592 



AGGCTACGGA 



119122 



ribosomal protein LI 8a 



AGGCTACGGA 



356678 



ribosomal protein L13a 



ESTs, Weakly similar to T07697 ribosomal pro tein L13a. cvtosolic 



594 
595 



TCACAAGCAA 



4112 t-complex 1 



32916 



nascent-polypeptiae-associated complex alpha polypeptide 



241432 



ESTs, Highly similar to c380Al.lb [H.sapiens] 



596 
597 



110695 



hypothetical protein MGC3133 



598 



GGACCACTGA 



ribosomal protein 13 



356258 



599 
600 
601 



GCGGTGAGGT 
CAATAAACTG 
CAATAAACTG 



203910 
150580 
297112 



ESTs, Weakly similar to ribosomal protein [Arabidopsis thaliana] f A.thaliana1 



602 



small glutamine-rich tetratricopeptide repeat (TPR)-containing 

putative translation initiation fact or 

r_- „ _ 



AGGAAAGCTG 



227591 



603 



AGGAAAGCTG 



hypothetical protein FLU 1088 



343443 



604 



CTGGGTTAAT 



356647 



ribosomal protein L36 



ESTs 




614 



CCAGAACAGA 



334807 



deoxythymidyiate kinase (thymidylate kinase) 



615 
616 
617 
618 
619 



GCATTTAAAT 
GCATTTAAAT 
GAAAAATGGT 



ribosomal protein, L30 



275959 
356184 
181357 



eukaryotic translation elongation factor 1 beta 2 
ESTs, Weakly s imilar to elongation factor. 1-beta, putative [Arabidopsis thaliana! rA^fiaT^T 
laminin receptor 1 (67kD, ribosomal protein SA) ——————————— 



GAAAAATGGT 
GGTTGGCAGG 



3 56267 
3745 



1 \~ • T- F — — — w.. .w yiw will kJi-^J . 

Homo sapiens laminin receptoMike protein LAMRL5 mRNA, comple te caT 
milk fat globule-EGF factor 8 protein " ' 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 



620 



622 
623 



GGTTGGCAGG 



GTGAAGGCAG 



GTG AAGGCAG 



Unigene 



17908 



77039 



356568 



Gene name 



origin recognition complex, subunit l-like (yeast) 



liposomal protein S3A 



ESTs, Weakly similar to Putative S-phase-specific ribosomal protein [Arabidopsis thaliana] [A.thaliana]- 



625 



626 



8036 



ATCTCAGCTC 



29736 



RAB3D, member RAS oncogene family 



AAAAAATTCA 



TNF receptor-associated factor 5 



254271 



hypothetical protein MGC24009 



627 
628 



TGGCCCCACC. 



146662 



Homo sapiens cDNA FU36928 fis,.clone BRACE2005216, weakly similar to Xenopus laevis bicaudal-C (Bic 
C) mRNA • 



pyruvate kinase, muscle 



syndecan 4 (amphiglycan, ryudocan) 



631 



632 
633 



CAACTGGAGT 



352566 



catenin (cadherin-associated protein); delta 1 



GCCCAGCTGG 



12479 



cytochrome P450 monooxygenase 



GCCCAGCTGG 



334798 



associated molecule with the SH3 domain of STAM 



hypothetical protein FLJ20897 



635 
636 



ATGAAACCCC 



75470 



endothelial cell growth factor 1 (platelet-derived) 



chromosome 1 open reading frame 29 



637 
638 



226396 



AGCCACCGCA 



hypothetical protein FU1 1 126 



242 



glucose-6-phosphatase, catalytic (glycogen storage disease type I, von Gierke diselSeT 



639 



640 



641 



642 
643 



244482 



CCCAGCTAAT 



73809 



M-phase phosphoprotein, mpp8 



CCCAGCTAAT 



200395 



arachidonate 15-lipoxygen ase 



GTG AAACCCC 



centromere protein H 



44396 coronin, actin binding protein, 2A 



GTGAAACCCC 



323949 



kangai 1 (suppression of tumorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen, antigen detected by 
monoclonal and antibody IA4)) *" ' 



644 



645 



646 
647 



289053 



GTGAAACCCT 



52644 



CAP-binding protein complex interacting protein 2 



GAGAAACCCC 



5719 



src family associated phosphoprotein 2 



GAGAAACCCC 



114318 



chromosome condensation-related SMC-associated protein 1 



hypothetical protein MGC16385 



648 
649 



365695 



GTGAAACCTT 



264636 



Homo sapiens cDNA FLU 1083 fis, clone PLACE1005232 



FK506 binding protein 14 (22 kDa) 



650 



651 
652 



75410 



GTGAAACTCC 



256158 



heat shock 70kD protein 5 (glucose-regulated protein, 78kD) 



GTGAAATCCC 



hypothetical protein BC0 18697 



274448 



hypothetical protein FLJ11029 



653 
654 



287587 



AACCCGGGAG 



118744 



Homo sapiens cDNA FU13671 fis, clone PLACE10U729 



AACCCGGGAG 



KIAAQ408 gene product 



173936 



interleukin 10 receptor, beta 



6874 



KIAA0472 protein 



656 



657 



658 
659 



GTGGCCGGCA 



169813 



TTGCCCAGGC 



hypothetical protein FU23040 



TTGCCCAGGC 



9711 
286124 



novel protein 



CD24 antigen (small cell lung carcinoma cluster 4 antigen) 



28902 0 
17173 



Homo sapiens cDNA FLJ1 1553 fis, clone HEMBA1003034 



solute carrier family 14 (urea transporter), member 1 (Kidd blood grou p) 

intf*rf/*rr\n-in Att ft* A 7*m***ZiZ , • • " ' 1" """* 



661 
662 



181874 interferon-induced protein with tetratricopeptide repeats 4 



670 
671 



stromal cell protein 




CCTGGCTAAT 



117062 



apoptosis-inducing factor (AIF)-homologous mitochondrion-associated ind ucer of death" 
Homo sapiens cDNA FLF12339 fis, clone MAMMA 1 002250 



301S09 
9280 



proteasome (prosome, macropain) subunit, beta type, 9 (large multifunctional protease 2) 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 



673 
674 



Tag 



GTGGCACGTG 



Unigene 



29759 



Gene name 



polymerase I and transcript release factor 



675 



306850 



GTGGCTCACA 



270134 



Homo sapiens cDNA: FU22796 fis, clone KAIA2544" 



hypothetical protein FLJ20280 



677 



678 



GTGGCTCACA 



124813 



TGCCTGTAAT 



349344 



hypothetical protein MGC14817 



TGCCTGTAAT 



hypothetical protein BC001573 



342655 



Homo sapiens cDNA FLJ13289 fis, clone OVARC 1 00 1 1 70 



14992 



hypothetical protein FU11151 



680 
681 



107003 



enhancer of invasion 10 



682 
683 



78060 



AGAATTGCTT 



phosphorylase kinase, beta 



19.031 



nephrosis 1, congenital, Finnish type (nephrin) 



ATCTTGGCTC 



75859 



mitochondrial ribosomal protein L49 



685 
686 



ATCTTGGCTC 



129228 



TTGGCCAGGA 



galactokinase 2 



146668 



KIAA1253 protein 



233335 



K1AA1 465 protein 



688 
689 



193384 



TTGACCAGGC 



194351 



putatative 28 kDa protein 



coagulation factor II (thrombin) receptor-like 2 



352382 



PI-3-kinase-related kinase SMG-1 



691 



355762 



AGCCACCACG 



57735 



Homo sapiens cDNA FLJ35653 fis, clone SPLEN20 13690 



scavenger receptor expressed by endothelial cells 



692 



693 



694 



695 



AGCCACCACG 



2593 



GTGAAACCCG 



phosphodiesterase 6B, cGMP-specific, rod, beta (congenital stationary night blindness 3, autosomal 
dominant) 



278577 



GTGAAACCCG 



302075 



Homo sapiens mRNA; cDNA DKFZp564P073 (from clone DKF2p564P073T 



CCCGGCTAAT 



273759 1 



Homo sapiens cDNA FLJ12365 fis, clone MAMMA1002392 



Homo sapiens cDNA FU1 1905 fis, clone HEMBB 1000050 



325116 



JMll protein 



697 
698 



17311 



GTGAAACCCA 



hypothetical protein FU20004 



241205 



peroxisomal membrane protein 4 (24kD) 



700 



701 



702 
703 



GTAAAACCCT 



281680 



GTAAAACCCT 



282797 



peroxisomal trans 2-enoyl CoA reductase; putative short chain alcohol dehydrogenase 



GTGAAACTCT 



188853 



Homo sapiens cDNA FLJ3 1 194 fis, clone KJDNE2000510 



GTGAAACTCT 



333449 



Homo sapiens cDNA FU12246 fis, clone MAMMA1001343 



Homo sapiens cDNA FLJ12170 fis, clone MAMMA 1 000664 



257584 



Homo sapiens cDNA FLJ12138 fis, clone MAMMA1000331 



296697 



Homo sapiens cDNA FU12093 fis, clone HEMBB 1002603 



705 
706 



707 



708 



280380 



GTGGCAGGTG 



aminopeptidase 



333480 



GCAAAACCCT 



10844 



Homo sapiens cDNA FLH3757 fis, clone PLACE3000405 



GCAAAACCCT 



121576 



leucine-rich alpha-2-glycoprotein 



myosin IB 



GCAAAACCCC 



86412 



chromosome 9 open reading frame 5 



710 
711 



GCAAAACCCC 



129708 



tumor necrosis factor Qigand) superfamily, member 14 



209065 



hypothetical protein. FU 1 4225 



713 
714 



715 



716 



AGGTCAGGAG 



212414 



AGCCACCGTG 



156051 



sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3E 



AGCCACCGTG 



KIAA1443 protein 



240845 



GTGGCACACA 



129057 



DKFZP434D146 protein 



GTGGCACACA 



207251 



breast carcinoma amplified sequence 1 



nucleolar autoantigen (55kD) similar to rat synaptonemal complex protein 



156942 



hypothetical protein BCO 17947 



718 
719 



271285 



KIAA1 5 10 protein 



720 
721 



91728 



TTGGCCAGAC 



374296 



polymyositis/scleroderma autoantigen 1 (75kD) 



hypothetical protein similar to K1AA0187 gene promicT 



48604 DKFZP434B168 protein 



723 



724 



725 



726 



53985 



CACCTGTAAT 



175613 



glycoprotein 2 (zymogen granule membrane) 



CACCTGTAAT 



TTGGCCAGGG 



287473 
321687 



claspin 

hypothetical protein FU11996 



F-box protein FBX30 



TTGGCCAGGG 



322840 



Homo sapiens, Similar to protein tyrosine phosphatase-like (proline instead of catalytic argjnine), member a, 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 

NO: 

727 

728 

729 
• 730 

731 

732 

733 

734 


Tag 

GAGAAACCCT 
GAGAAACCCT 
GCGAAACCCT 
GCGAAACCCT 
GTGAAACCTC 
GTGAAACCTC 
GCGAAACCCC 
GCGAAACCCC 


Unigene 

321149 
274279 
103189 
,225084 
168159 
. 334526 
30211 
288945 


Gene name 

hypothetical protein FU10257 " ~~ 

hypothetical protein FLJ103 14 ~ "~ ~ — 

lipopolysaccharide specific response-68 protein ~ — — — . 

hypothetical protein FU14280 " ! " ~ . " r — 

bifunctional apoptosis regulator — — — ^ — 

hypothetical protein MGC14126 ~~~ ~~ — 

hypothetical protein FU 13448 " ' ' 


735 

736 

737 

738 

739 

740 • 

741 

742 


AGCCACCGCG 

AGCCACCGCG 

CGCCTGTAAT . 

CGCCTGTAAT 

GTGGCGGGCG 

GTGGCGGGCG 

AACCTGGGAG 

AACCTGGGAG 


- 122660 
355874 
154443 
287594 
22926 
181780 
105658 
334638 


RAB, member of RAS oncogene family-like 2A : T 1 ~ 

RAB, member of RAS oncogene family-like 2B ~~~ ' ~ " ~ ~ =" 

MCM4 minichromosome maintenance deficient 4 (S. cerevisiae) ' ' ( 

hypothetical protein FLT13769 — — — 

KIAA0795 protein ~~ ~ — : 

hypothetical protein FU20241 — — — _ 

DNA fragmentation factor, 45 kD, alpha polypeptide 

hypothetical protein MGC 1 6 1 75 — 


743 


GCTTTCTCAC 






744 
745 


CTTGTAATCC 
CTTGTAATCC 


183253 
231119 


nucleolar RNA-associated protein : 1 — : — — 

protocadherin beta 9 " " '. ~ - 


746 


TCTGTAATCC 


272216 


glycoprotein VI (platelet) 


747 
748 


TCTGTAATCC 
CCTATAATCC 


142 
86228 


suifotransferase family, cytosoltc, I A, phenol-preferring, member 1 

TRIAD3 protein [ : — 


749 
750 


CCTATAATCC 
TAATCCCAGC 


189658 
12496 


v^vji— it? ptuLwin 

Homo sapiens cDNA FU23834 fis, clone KAIA2087 


751 


TAATCCCAGC 


278941 


PRO0628 protein • — 


752 


TGCCTGTAGT . 


48469 


LIM domains containing 1 ' ~~ '" 


753 


TGCCTGTAGT 


274201 


winuiuuauuic i ujJCU I Calling liainC J J 


754 


AGGGTGTTTT 


75842 


dual-specificity tyrosine-(Y)-phosphorylation regulated kinase I A 


755 


AGGGTGTT1T 


160416 


ESTs — — • 


756 


CCAGGGCAAC 


240443 


multiple endocrine neoplasia I . ~~ ~~ — - 


757 


ATTGTGCCAC 


22151 


neurolys in (metallopeptidase M3 family) " " 


758 
759 1 


ATTGTGCCAC 


38761 


Homo sapiens cDNA: FU21564 fis, clone COL06452 ' 




CCTGTAATCT 


199067 


v-erb-b2 erythroblastic leukemia viral oncogene homolog 3 (avian) 


• 760 
761 J 


CCTGTAATCT 
GTGGTGGGCA 


3530 
99975 


FUS interacting protein (serine-arginine rich) 1 — 
cholinergic receptor, nicotinic, delta polypeptide " *" 


762 


GTGGTGGGCA 


374536 


isovaleryl Coenzyme A dehydrogenase 


763 
764 
765 


TACCCTAAAA 
TACCCTAAAA 
ATGGTGGGGG 


165662 
268971 
343586 


KIAA0675 gene product " 1 — 

Homo sapiens clone IMAGE:2 12461 , mRNA sequence 

zinc fmger protein 36, C3H type, homolog (mouse) ~ 


766 
767 
768 
769 
770 
771 
772 
773 
774 
775 
776 
777 
778 
779 
780 
781 
782 


ACCCTTGGCC 

GTGAAAACCC 

GTGAAAACCC 

ATCCACCCGC 

ATCCACCCGC 

TTAGCCAGGA 

TTAGCCAGGA 

ATGAAACCCT 

ATGAAACCCT 

GTGGCTCACG 

GTGGCTCACG 

TTGGCCAGGC 

TTGGCCAGGC . 

TTGGTCAGGC . 

TTGGTCAGGC 

TTGTCCAGGC 

TTGTCCAGGC 


127305 
351029 
145381 

53263 
196270 
350692 

31330 
187991 
3454 
127649 
118194 
274382 
154069 
172012 

99423 

51305 


agmatine ureohydrolase (agmatinase) 

Homo sapiens cDNA FU3 1 803 fis, clone NT2RI2009 101 

general transcription factor IIE, polypeptide 1 (alpha subunit, 56kD) 

nucleoporin Nup43 — . ~™ — — 

i^iaic uoitd|jijiic:i/waiiici 

Homo sapiens cDNA FU32756 fis, clone TESTI200 1 758 

Homo sapiens clone HQ03 19 : " : 

SOCS box-containing WD protein SWiP-l 

KIAA1821 protein ~ ~ — \ — ~ — \ 

zinc finger protein 297B ~" : ; 

debranching enzyme homolog I (S. cerevisiae) ( 

protein kinase, interferon-inducible double stranded RNA dependent J 

melan-A ' ' : \ ; — — » — ; 

tiypothetical protein DKFZp434J037 ~—r- — : 

ATP-dependenti^A'helicase ~ • : — — 

v-maf musculoaponeurotic fibrosarcoma oncogene homolog F (avian) H 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 

NO: . 



Tag 



CTTAATCTTG 



Unigene 



75462 



Gene name 



BTG family, member 2 



stromal cell-derived factor 1 



62954 



TGGGGTTCTT 



272499 



ferritin, heavy polypeptide 1 



AAGAAGATAG 



350046 



dehydrogenase/reductase (SDR family) member 2 



ribosomal protein L23a 



AGAATCGCTT 



16165 



ESTs, Highly similar to RJL2B HUMAN 60S ribosomal protein L23a [H.sapiens] 



expressed in activated T/LAK lymphocytes 



CCTGTAGTCC 



51305 



coatomer protein complex, subunit alpha 

v-maf musculoaponeurotic fibrosarcoma oncogene homolog F ( avian) 
hvoothetical nrntein ft rifton "** - 



CCTGTAGTCC 



77510 



AGCCACCACA 



5999 



hypothetical protein FU10520 



hypothetical protein FU10298 



8768 



hypothetical protein FU10849 



210778 



hypothetical protein FU10989 



Homo sapiens cDNA FU1 1405 fis, clone HEMBA1000769 



287515 



CCACTGTACT 



288537 



hypothetical protein FU12331 



CTGTACTTGT 



75678 



Homo sapiens cDNA FU12199 fis, clone MAMMA1Q00880 



FBJ murine osteosarcoma viral oncogene homolog B 



CCATTCTCCT 



271752 



hypothetical protein BC006136 
3'(2'), 5'-bisphosphate nucleotidase I 



73614 



solute carrier family 3 1 (copper transporters), member 1 



287522 



AGCCACTGCG 



AGCCACTGCG 



193914 
356075 



Homo sapiens cDNA FU 12364 fis,. clone MAMMA1002384 



KIAA0575 gene product 



GCCGGCTCAT 



ninjurin 2 



GCTCACTGCA 



93523 



peptidylprolyl isomerase (cyclophilinHike 2 



117572 
120769 



chemokine binding protein 2 



CCTQTGGTCC 



Homo sapiens cDNA FU20463 fis, clone KAT06143 



243804 Homo sapiens cDNA FU13800 fis, clone THYRO1000156 



306189 



GGAGGCTGAG 



DKFZP434F1 735 protein 



185973 



degenerative spermatocyte homolog, lipid desaturase (Drosophila) 



130815 



AGAATCACTT 



192127 



hypothetical prote'in FU21870 



Homo sapiens, clone MGC:32020 IMAGE:4620233, mRNA, cornl^teldT 



129908 



kinesin family member IB 



306678! 



AGCCACTGCA 



hypothetical protein FU14326 



4295 



AGCCACTGCA 



173508 



proteasome (prosome, macropain) 26S sub unit, non-ATPase 12 
P3ECSL ■ "~" 



AACCCAGGAG 



262150 



hypothetical protein FU228I4 



75813 



polycystic kidney disease 1 (autosomal dominant) 



10326 



coatomer protein complex, subunit epsilon 



119324 



kinesin-like 4 




GCCGTGTCCG 



356666 



GCCGTGTCCG 



350166 



ESTs, Highly similar to RS6 HUMAN 40S ribosomal protein S6 (Phosphop rotein NP33^ [H.sapiens] 
ribosomal protein S6 - " 



CCCATCCGAA 



356175 



ribosomal protein L26 



ESTs, Weakly similar to T460S7 60S RIBOSOMAL PROTEIN-like 



CCCGAGGCAG 



45057 



Homo sapiens, Similar to doublecortin and CaM kinase-iike 1, clone MGC;45428 IMAGE:5532881 mRNA: " 
complete cds * . ■ 



CCTGAAATTT 



77492 heterogeneous nuclear ribonucleoprotein AO 



12102 
9585 



sorting nexin 3 . 



CTCACTTTTT 



76722 



cDNA FU30010 fis, clone 3NB692000 1 54 



CCAAT/enhancer binding protein (C/EBP), delta 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQ ID 
NO: 
837 


Tag 

GCTGTTGCGC 


Unigene 
8102 


Gene name 

ribosomal protein S20 ' : "~ 1 ; " 


838 


TCCCCGTACA 






839 


CACAAACGGT 


195453 


ribosomal protein S27 (metalloDanstimulin n 


840 


CACAAACGGT 


356178 


ESTs,. Moderately similar to T47903 ribosomal nrntein qji " 


841 


CCCTGATTTT 


183684 


eukaryotic translation initiation factor 4 gamma, 2 


. 842 


CCCTGA1T1T 


1799 


CD ID antigen,' d polypeptide ~~ • ' "* " — ~~" : 


843 


TGGGCAAAGC 


1 2186 




844 


TAACTTGTGA 


295726 


integrin, alpha V (Vitronectin recentnr alnha nntvn^ntiH^ antiopn rn?h 


845 


AGCACCTCCA 


75309 




846 


GAGGGAGTTT 


76064 




847 


GAGGGAGTTT 


356342 


ESTs, Highly similar to 21 13200C ribosomal protein L27a [Homo sapiens] [H.sapiens] 


848 


GCGACAGCTC 


184582 




849 


CGCCGCCGGC 


182825 




850 


GGCAAGCCCC 


334895 




851 


GGCAAGCCCC 


187577 




852 


AGCTCTCCCT 


82202 




853 


AGCTCTCCCT 


' 374588 


^ **> oitimui wj xwiiuM iiuuauuioi protein Lit /, cytosoiic 


854 


CGCTGGTTCC 


179943 




855 


CGCTGGTTCC 


289019 


luvwiu uuiioiui i mug gtuwm loviur ucia oinuing proiein J 


856. 


GAAACCGAGG 


268053 


R3H domain fbinds linple-^tranH^H nnr*l*»in a^iHc\ onntTini^n " * 

uwiiioui v uillu<3 dnigic-auaiiucu nucleic aCIQSJ OOnlaininS 


857 


GAAACCGAGG 


279813 




• 

858 


GAGGTCCCTG 


374499 


* similar to raoz akaih rroteasome subunit alpha type 6-2 (20S proteasome alpha subunit A2) 
[Athaliana] 


859 


GAGGTCCCTG 


74077 


r* ^iwijwiiiw| iiiawiupauiy auuuiui, ai^Jiia LjPC, O 


860 


TGAAATAAAA 


9614 




861 


TGAAATAAAA 


48516 


ESTs ' r ~ ~ : 


862 


CCCCAGCCAG 


252259 




863 


CCCCAGCCAG 


334861 




864 


TAAATAATTT 


1197 


heat shock lOkD protein 1 (chaperon in 10) 


865 


ATAATTCT1T 


288806 


Homo sapiens cDN A FLTl 1778 fis clone HEVfRAIflfHQi 1 


866 


ATAATTCTTT 


539 




867 


TTAAACCTCA 


17031 1 




868 . 


TTAAACCTCA 


347810 


ests : — — ' — 


869 


GCCGAGGAAO 


339696 


ribosomal protein S 1 2 - " 


.870 


GCCGAGGAAG 


143067 




871 


GCCTGTATGA 


180450 




872 


GCCTGTATGA 


356794 




873 


GTGTTAACCA 


74267 


ribosomal protein LI 5 ~" " ' — • ■ ■ 1 — 


874 


CTTCGAAACT 


51299 




875 


AAGGTCGAGC 


184582 




876 


AAGGTCGAGC 


356004 


ESTs> Weakly similar to T47559 60S rihnsnmal nmt<*in-iii»A 


877 


CTTTGGAAAT 


6820 




878 


CTTTGGAAAT . 


184222 




.879 - 


CCCCCTGGAT 


275243 




880 


CGCCGGAACA 


356448 


ESTs, Weakly similar to RL4B ARATH 60S ribosomal protein L4-B (LI) [A.thaliana] 


881 


CGCCGGAACA 


286 


ribosomal protein L4 


882 


GTGTTGCACA 


301251 


Homo sapiens cDNA FU12014 fis, clone HEMBB 1001685 


883 


GTGTTGCACA 


165590 


ribosomal protein S 13 , — 


884 


CAACTTAGTT 


180224 


myosin regulatory light chain 


885 


GGGGCAGGGC 


9383 


cysteine-rich with EGF-Iike domains I " ' 


886 


ccAAGrrrrr 


75914 


coated vesicle membrane protein 


887 


TTGGCAGCCC 


76064 


ribosomal protein L27a* 


888 


GTTAACGTCC 


178391 


ribosomal protein L36a ~ ^ " '■ : 7^ — 


889 


GTTAACGTCC 


355599 


ESTs,. Moderately similar to putative ribosomal protein [Arabidopsis thaliana] [A.thaliaha] 


.890 


GGAAGTTTCG 


55847 


mitochondrial ribosomal protein L51 


891 


CCCGTCCGGA . 


180842 


ribosomal protein L13 " 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQ ID 
NO: 



Tag 



CCCGTCCGGA 



GGCCGCGTTC 



GGCCGCGTTC 



Unigene 



356148 



5174 



356626 



Gene name. 



ESTs, Weakly similar to 60S ribosomal protein L13 [Arabidopsis th alianal rA.thaIianaf 
ribosomal protein S 17 "~ — — — 



Homo sapiens cDNA FLJ34449 fis, clone HLUNG2002145 



172182 poly(A) binding protein, cytoplasmic I 



AACTCCCAGT 



354497 



110571 



ESTs 



growth arrest and DNA-damage-inducible, beta 



118126 



CACTTTTGGG 



321497 



protective protein for beta-galactosidase (galactosialidosis) 



Homo sapiens cDNA FU31347 fis, clone MESAN2000023 



334851 



GGGAGGGAAG 



75243 



LIM and SH3 protein 1 



bromodomain containing 2 



160953 



p53-regulated apoptosis^inducing protein 1 



129548 



heterogeneous nuclear ribonucleoprotein K 



180900 



TCCCCGTGGC 



75616 



Williams-Beuren syndrome chromosome region 1 



TCCCCGTGGC 



356547 



24-dehydrocholesterol reductase 



hypothetical protein BC0 16005 



31439 



GCCTGCAGTC 



273385 



serine protease inhibitor, Kunitz type, 2 



GNAS complex locus 



250655 



AGAATTTGCA 



374658 



prothymosin, alpha (gene sequence 28) 



ESTs, Highly similar to TNHUA prothymosin alpha 



CACACAGTTT 



4055 



204354 



Homo sapiens mRNA; cDNA DKFZp564C2063 (from clone DKFZp564C206^T 



ras homolog gene family, member B 



AGAGGTGTAG 



TTAGCCAGGC 



71367 



similar to RIKJEN cDNA 1 1 10058LI9 



161640 



tyrosine aminotransferase 



TGGAAAGTGA 



101047 



v-fos FBJ murine osteosarcoma viral oncogene homolog 



transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47T 



AGGAGCGGGG 



252189 



GCCCCTCCGG 



83753 



syndecan 4 (amphiglycan, ryudocan) 



small nuclear ribonucleoprotein polypeptides B and B 1 



GCTGCCCTTG 



16.7Kd protein 



348557 



tubulin alpha 6 



272897 



CCACCCCGAA 



tubulin, alpha 3 



74637 



GCTGCGGTCC 



795 



testis enhanced gene transcript (BAX inhibitor 1) 



H2A histone family, member O 



GAGATCCGCA 



RD RNA-binding protein 



75348 proteasome (prosome, macropain) activator subunit 1 (PA28 alpha) 



8997 



Sadl unc-84 domain protein 1 



GCAAGCCAAC 



181002 



MIX septin-like fusion 



85155 



zinc finger protein 36, C3H type-like 1 




GTCCGAGTGC 



TAACAGCCAG 



81328 



transmembrane 4 superfamily member 1 



nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitorjaipha 



GCCTTGGGTG 



hypothetical protein FU 14075 



2250 



leukemia inhibitory factor (cholinergic differentiation factor) 



permidine/spermine Nl-acetyltra 



945 



946 



947 



13323 hypothetical protein FU22059 



5372 



claudin 4 



ATCGTGGCGG 



8026 



sestrin 2 



CCTGGCCTAA 



CCTGGCCTAA 



297285 
111676 



ESTs, Weakly similar to ZF37 HUMAN Zinc finger protein ZFP-37 [H.sapiens] 



protein kinase HI 1 
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Table 3. Genes employed for the clustering analysis shown in Fig, 3B 



SEQID 
NO:. 
948 
949 
950 
951 
. 952 
. 953 
954 
955 


Tag 

AAGATTGGTG 
AATCCTGTGG 
AATCCTGTGG 
TGGTGTTGAG 
TGGTGTTGAG- 
CTGGCCCTCG 
CTGGCCCTCG 
GACTCTTCAG 


Unigene 

1244 
4391C 
178551 
275865 
- 3745 10 
350470 
43654 
234726 


Gene name 

\ CD9 antigen (p24) "~ ~ r — 

CD 164 antigen, sialomucin " \ " — ~~ 

ribosomal protein L8 ! " ~" " '" — — - — 

nbosomal protein S 18 "' 

ESTs, Highly similar to S3 0393 ribosomal prote i n" S 1 8 ~ cytoso I ic 1 

trefoil factor 1 (breast cancer, estrogen-inducible sequence expressed in) 1 

ceroid-lipofuscinosis, neuronal 6, late infantile, variant — ~~ 


956 
957 
958 
959 
960 
961 
962 

963 
964 
965 
966 
967 

Vols 
969 
970 
971 
972 
973 
974 
975 
976 
977 
978 
979 . 


CTGCCAACTT 

GTGCGCTGAG 

GTGCGCTGAG 

TTGGGGTTTC 

TTGGGGTTTC 

GGAGGGGGCT 

GGAGGGGGCT 

TTAGTTTTTA 

TTAGTrrri'A 

CCCAAGCTAG 

CCCAAGCTAG 

GTGCACTGAG 

GTGCACTGAG 

CAGACTTTTT 

CAGACTTTTT 

AAAACATTCT 

CACCTAATTG 

GGGACGAGTG 

CAAGCATCCC 

AGCAGATCAG 

AGCCCTACAA 

TGAAGTAACA 

GCTAGGl"iT'A 

CAAAATCAGG 


180370 
181244 
277477 

62954 
374602 

77886 
110642 

323949 
274404 

76067 
374617 
181244 
277477 
293884 

78683 
323562 

119301 
95243 
150580 

79933 


senne (or cysteine) proteinase inhibitor, clade A (alpha-l antiproteinase, antitrvosinV member 3 

cofilin 1 (non-muscle) ~™ — — „ 

major histocompatibility complex, class I, A " : : : 

major histocompatibility complex, class I, C ' ' ~ ~~ ! 

ferritin, heavy .polypeptide 1 ■ 

ESTs. Weakly similar to putative ferritin [Arabidopsis thaliana] [A.thaliana] | 

laminA/C : : : : : — 

neurotensin receptor I (high affinity) - ~ — 

kangai 1 (suppression of tumorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen, antigen detected by 
monoclonal and antibody IA4)) . 

plasminogen activator, tissue " ~ — " — — 

heat shock 27kD protein 1 [ ~~ "" — : 

ESTs, Highly similar to HHHU27 heat shock protein 27 

major histocompatibility complex, class I, A - ' 

major histocompatibility complex, class I, C " : 

helicase/primase complex protein ™~ "~" "" "' "■' 

ubiquitin specific protease 7 (herpes virus-associated) • 

hypothetical protein DKFZp564KI42 similar to implantation-associated protein ' 

S100 calcium binding protein A!0 (annexin II ligand, calpactin I, light polypeptide (pi O) 

transcription elongation factor A (SII)-l ike 1 ' 

putative translation initiation factor ~ ! ' : — 

cycl in I "* "~~ ~ : - : ! — ~~ r ' 


980 
981 
982 
983 
984 
985 
986 
987 
988 
989 
990 
991 
992 
993 
994 

995 
996 
997 
998 
999 
1000 
1001 
1002 


GGCTGGGGGC 
GGCTGGGGGC 
GGCCCTAGGC 
GCTGAACGCG 
AAGAGCGCCG 
AAGAGCGCCG 
AGGGTGAAAC 
AGGGTGAAAC 
GATCCCAACT 
GCCTACCCGA 
CCAGGAGGAA 
CCAGGAGGAA 
CCAGTGGCCC 
CCAGTGGCCC 
GAAGCTTTGC 

GAAGCTTTGC 

TGTGTTGAGA 

TGTGTTGAGA 

GTGACAGAAG 

GTGACAGAAG 

CCTCGGAAAA • 

CCTCGGAAAA 

CTCATAAGGA 


75721 
352407 

78909 

99029 
8997 
274402 

77608 
363356 
118786 

23582 
276 
180414 
180920 
356713 
289088 

356532 
181165 
356428 
129673 
356129 
2017 
343481 


profilin 1 ~ '. 1 : . : 

chromosome 1 amplified sequence 3 " 

zinc finger protein 36, C3H type-like 2 1 ! ~~ 5 — — 

CCAAT/enhancer binding protein (C/EBP), beta ~~ ~~ — . 

Sadl unc-84 domain protein 1 ~ ~~~ • : = 

heat shock 70kD protein IB ~~ '■ — : ■ 

splicing factor, arginine/serine-rich 9 1 "* : '' ' — 

EST rrr ~ ~ — ' — 

metallothionein 2A • — — ~- 

tumor-associated calcium signal transducer 2 "™ '' — : 

farnesyltransferase, CAAX box, beta *~ '• 

heat shock 70kD protein 8 ~^ ■ : ; 

ESTs, Moderately similar to T49955 40S ribosomal protein-like * " 

heat shock 90kD protein 1, alpha " : ! ~ m 

ESTs, Moderately similar to 190843 1 A heat shock protein HSP8I-I [Arabidopsis thaiianal fA.th a li a n«l 

eukaryotic translation elongation factor 1 alpha 1 ~ r ~ — p 

Homo sapiens mRNA expressed only in placental villi, clone SMAP83 — 

sukaryotic translation initiation factor 4A* isoform I ~~' ~ ' : rs - s — 

?STs ( Weakly similar to JC 1453 translation initiation factor eIF-4A2 r : — ^ — " 

ribosomal protein L38. ~ ' — ~ — 5 — — ' 

ESTs, Weakly similar to RL38 ARATH 60S ribosomal protein 138 [Athaliana] 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQ [D 

NO: 


Tag 


Unigene 


Oene name 


1003 


CTAGCCTCAC 


14376 


actin, gamma! ~~ ' ; 


1 Art/1 

IUU4 


GGGCCAACCC 


1 19475 


cold inducible RNA binding protein " — - — ' 


1 fine ' 


GGGCCAACCC 


226795 


glutathione S-transferase pi 


IUU6 


ACCCCCCCGC 


2780 


jun D proto-oncogene 


1007 


GGTGCCCAGT 


75607 


myristoylated aJanine-rich protein kinase C substrate 


1008 


uCl I lAl \ 1U 


288061 


actin, beta '' " " ~" — 


. 1009 
1010 


GGCTCCCACT 
CTAAG ACTIO 


[ 74335 


heat shock 90kD protein 1 , beta " 


1011 


GGGTAGCTGG 






1012 


ACCCACGTCA. 


298184 


potassium voltage-gated channel, shaker-related subfamily, beta member 2 


1013 


ACCCACGTCA . 


198951 


jun B proto-oncogene 


1014 


GGGCAGGCGT 


737 


immediate early protein — 


1015 


GTTCACTGCA 


77318 


platelet-activating factor acetylhydrolase, isoform lb, alpha subunit (45kD) 


1016 


GTTCACTGCA 


168383 


intercellular adhesion molecule 1 (CD54), human rhinovirus receptor 


1017 


ACTCAGCCCG 


101382 


tumor necrosis factor, alpha-induced protein 2 


1018 


ACTCAGCCCG 


4990 


FOAA1089 protein ' ~~ — 


1019 


TGATTTCACT . 






1020 


AGGT1TCCTC 


9736 


proteasome (prosome, macropain) 26S subunit, non-ATPase, 3 


1021 


ACCATCCTGC 


32963 


cadherin 6, type 2, K-cadherin (fetal kidney) *"J 


1022 


ACCATCCTGC 


76095 


immediate early response 3 1 


1023 


GGGAGGTAGC 


171825 


basic helix-loop-helix domain containing, class B, 2 


1024 


CCGTCCAAGG 


80617 


ribosomal protein S 16 — 1 1 


1025 


CTCACCGCCC 


183650 


cellular retinoic acid binding protein 2 


1026 


CCCGCCCCCG 


155048 


Lutheran blood group (Auberger b antigen included) 


1027 


ACTAACACCC 






1028 


CACTACTCAC 






1029 


CAGGAGGAGT 


289101 


glucose regulated protein, 58kD 


1030 


CAGGAGGAG+ 


356023 


ESTs, Weakly similar to PDI2 ARATH Probable protein disulfide isomerase 2 precursor (PDI) [Athaliana] 


1031 


GCGACCGTCA 


273415 


aldolase A, rhictose-bisphosphate 


1032 


AAGGGAGGGT 


182248 


sequestosome 1 


1033 


GGCAGCCAGA 


75061 


macrophage myristoylated alanine-rich C kinase substrate 


1034 


GGCAGCCAGA 


144501 


ESTs"" ~~ — — 


1035 


TGTGGGTGCT 


306339 


Homo sapiens mRNA; cDNA DKFZp586N2022 (from clone DKFZp586N2022) 


1036 


TGTGGGTGCT 


194657 


cadherin 1, type 1, E-cadherin (epithelial) 


1037 


AlTiUAUAAU 


178658 


RAD23 homolog B (S. cerevisiae) 


1038 


AATGGAAATC 


4943 


melanoma antigen, family D, 2 — 


1039 


AATGGAAATC 


58103 


A kinase (PRKA) anchor protein (yotiao) 9 


1040 


1 HUUUCCTA 


17409 


cystein rich prptein (CRP I) " " 


1041 ^ 


CAACTAATTC 


69997 


zinc finger protein 238 


1042 


CAAGTAATTC 


75106 


clustenn (complement lysis inhibitor, SP-40,40, sulfated glycoprotein 2, testosterone-repressed prostate 
message 2, apolipoprotein J) 


1043 


UTIUtGGTFa 


75415 


beta-2-microglobulin .... 


1044 


uhuiGGiTa 


99785 


Homo sapiens cDNA: FU2 1245 fis, clone COLO 1 1 84 


1045 


TTAAATGGAA 


33944 


QJIi> « vrcqiwiy smuiiu m nypumciicai protein m_jzu4«v [Homo sapiens] [H.sapiens] 


1046 


TTAAATGGAA 


351593 


fibrinogen, A alpha polypeptide 


1047 


CTTAAAAAAA 


306309 


Homo sapiens mRNA; cDNA DKFZp566L0824 (from clone DKFZp566L0824) 


1048 


CTTAAAAAAA 


75063 


human immunodeficiency virus type 1 enhancer binding protein 2 


1049 


CTTCTCCAAA 


151242 


serine (or cysteine) proteinase inhibitor, clade-G (CI inhibitor), member I, (angioedema, hereditary) 


1050 


CTTCTCCAAA 


6671 


COP9 constitutive photomorphogenic homolog subunit 4 (Arabidopsis) 


1051. 


TACCTGCAGA 


100000 


S 100 calcium binding protein A8 (calgranulin A) 


1052 


ATAATAAAAG 


89690 


GR03 oncogene 


1053 


ATAATAAAAG 


250879 


Homo sapiens cDNA FU25968 fis, clone CBR0 1 977 • 1 * 


1054 


AGAAAGATGT 


• 352541 


hypothetical protein MGC29937 ^ ~ : — 


1055 


AGAAAGATGT 


78225 


annexin Al " ; — ■ 
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Table 3. Genes employed for the clustering analysis shown in Fig. 3B 



SEQID 
NO: 


Taa 


t Jni<ypnp 


uene name. 


1056 


GTGCGGAGGA 


332053 


serum amyloid Al 


1057 


GTGCGGAGGA 


336462 


serum amyloid A2 


1058 


GGAAAAGTGG 


265317 


hypothetical protein MGC2562 


1059 


GGAAAAGTGG 


297681 


serine (or cysteine) proteinase inhibitor, clade A (alpha- 1 antiproteinase, antitrypsin), member I 


1060 


AATAGGTCCA 


1 13029 


ribosomal protein S25 


1061 


AATAGGTCCA 


. 356801 


ESTs, Weakly similar to T08568 ribosomal protein S25, cytosolic 


.1062 


GTiTATUOAT 


365706 


matrix Gla protein 


1063 


CAACAATAAT 


283683 


chromosome 8 open reading frame 4 


1064 


riTArriTAA 


. 46452 


secretoglobin, family 2A, member 2 


1065 


CTTCCTGTGA 


348419 


small breast epithelial mucin 


1066 


TAAAAACTTT . 


204096 


secretoglobin, family ID, member 2 


1067 


TAAAAACTTT 


343411 


Homo sapiens mRNA; cDNA DKFZp586fC2322 (from clone DICFZp586K2322) 


1068- 


ACACAGCAAG 


27115 


ESTs, Weakly similar to SFRB HUMAN Splicing factor arginine/serine-rich 1 1 (Arginine-rich 54 kDa 
nuclear protein) (P54) [H.sapiens] 


1069 


TGCAGCACGA 


277477 


major histocompatibility complex, class I, C 


1070 


TGCAGCACGA 


110309 


major histocompatibility complex, class I, F 


1071 


ACTCCAAAAA 


356465 


ESTs, Moderately similar to S71259 ribosomal protein SI 5, cytosolic 


1072 


ACTCCAAAAA 


344078 


Homo sapiens, clone 1MAGE:3840457, mRNA 


1073 


GCCTCCTCCC 


283781 


muscle specific gene 


1074 


GCCTCCTCCC 


319084 


EST 


1075 


AAGCTCGCCG 


62492 


secretoglobin, family 3 A, member I, HIN-1 


1076 


CCTGGTCCCA 


23881 


keratin 7 


1077 


CCTGGTCCCA 


167679 


SH3-domain binding protein 2 


1078 


GAATTAACAT 


79474 


tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, epsilon polypeptide 


107? 


GAATTAACAT 


90073 


CSE I chromosome segregation 1 -like (yeast) t 


1080 


TAATTTGCGT 


79368 


epithelial membrane protein 1 


1081 


TTGGTTTTTG 


164021 


small inducible cytokine subfamily B (Cys-X-Cys), member 6 (granulocyte chemotactic protein 2) 


1082 


nxaarrmxj 


170088 


SLC2A4 regulator 


1083 


GCTTGCAAAA 


6823 


neuropil in (NRP) and tolloid (TLL)-like 2 


1084 


GCTTGCAAAA 


372783 


superoxide dismutase 2, mitochondrial . 


1085 


GCCGCCCTGC 


76394 


enoyl Coenzyme A hydratase, short chain, 1, mitochondrial 


1086 


GCCGCCCTGC 


82208 


acyl-Coenzyme A dehydrogenase, very long chain 


1087 


CTTCCAGCTA 


217493 


annexin A2 


1088 


CTTCCAGCTA 


101651 


Homo sapiens mRNA; cDNA DKFZp434C107 (from clone DKFZp434Cl07) 


1089 


CGAATGTCCT 


335952 


keratin 6B 


1090 


TTGAAACTTT 


789 


GROi oncogene (melanoma growth stimulating activity, alpha) 


1091 


TTGAAGCTTT 


302738 


Homo sapiens, cDNA: FU2 1425 fis, clone COL04 1 62 


1092 


CCCGGGAGCG 


75807 


PDZ and UM domain 1 (elfin) 1 


1093 


CCCGGGAGCQ 


273186 


chaperone, ABC1 activity of be I complex like (S. pombe) 


1094 


GGACTCTGGA 


71 


alpha-2-glycoprotein 1, zinc 


1095 


GGACTCTGGA 


56023 


brain-derived neurotrophic factor 


1096 


GTCTTAAAGT 


177781 


Homo sapiens, clone IMAGE:47 11494, mRNA • 


1097 


CAGCTCACTG 


738 


ribosomal protein L14 


1098 


CAGCTCACTG 


356012 


ESTs, Weakly similar to T06039 ribosomal protein L14 homolog T24A18.40 
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Example 3. Molecular Markers in DCIS 
To determine if there are genes that are statistically significantly more likely to be 
expressed in DCIS than in invasive tumors (and vice versa), various statistical tests were 
performed (see Example 1). Based on these analyses, the levels of expression of CD74. and a 

5 SAGE tag (CTGGGCGCCC) (SEQ ID NO: 1 109) with no database match were found to be 
■ significantly greater in invasive or metastatic tumors than in DCIS (p=0.02 and p=0.05, 
respectively, Table 4). The samples studied were the same as those shown in Table I ; the 
sample designated "Ml" in Table 4 was the same as undesignated "MET" in Table 1 . The 
expression of MGC2328, DBC- 1 , and eight other genes was also more likely to occur in 

o invasive/metastatic tumors than in DCIS, but none of these differences in expression reached 

statistical significance (Table 4). Similarly the expression of S100A7 and keratin 19 ("KRT19") 
was more frequent and at higher levels in DCIS than in invasive/metastatic tumors but this 
difference in expression was only marginally statistically significant. 

In a second statistical analysis, ROC (receiver operating characteristic) curve analysis 

5 was used to choose the "best cutoff" for values, i.e., the cut-off that results in the most samples 
being correctly classified as DCIS or invasive, weighing both kinds of misclassification equally 
(Table 4). Tags that do not include 0.50 in the confidence interval (CI) could be useful for the 
differential diagnosis of in situ versus invasive carcinomas. Such tags include all those with 
p £ 0. 13 using the higher of two normals' cut-off as well as 3 other high in DCIS tags and 3 other 

) high in invasive tags (Table 4). Using the best cut-off values, several of the SAGE tags correctly 
classified most of the DCIS and invasive SAGE libraries. For example KRT19 expression 
classified 75% of the DCIS and 0% of the invasive libraries as DCIS, while MGC23280 
expression diagnosed 78% of the invasive cancer and 0% of the DCIS libraries as "invasive". 
Thus, MGC23280 expression had 78% sensitivity and 100% specificity to correctly categorize 

>. breast tumors as DCIS or invasive/metastatic in this data set. 
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Next, 26 genes that appeared to be the most highly differentially expressed between 
normal and DCIS samples or between intermediate (D2) and high-grade (Dl) DCIS at p s 0.001 
using, the SAGE 2000 software were selected for further validation studies (Table 5). It was 
hypothesized that genes most highly differentially expressed between normal and DCIS tissue or 
two different types of DCIS tumors could be used as molecular markers for defining biologically 
and potentially clinically meaningful subgroups of DCIS. This concept was supported by the 
observation that clustering analysis of the eight DCIS libraries using only these 26 genes gave a 
dendogram.(Fig. 3C) that was almost identical to that obtained using 582 genes (Fig. 3B). In 
Table 5, the samples shown are the same as those shown in Table 4 and the column labeled 
"Method" indicates the technique used to validate the conclusions of the relevant SAGE data 
(ISH, in situ hybridization; IH, immunohistochemistry; ND, not done). 
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. Example 4. Confirmation of SAGE Gene Expression Studies by mRNA in situ Hybridization 
mRNA in situ hybridization determines gene expression at the cellular level and is . 
particularly useful in solid tumors that are heterogeneous in cellular composition. Eighteen 
frozen DCIS and invasive breast cancer samples were used for such a study. Whenever possible 
5 tumors were selected to include normal, DCIS, and invasive comppnents on the same slide in 

order to obtain expression data in these three stages of breast tumorigenesis. Examples of in situ 
hybridization results are depicted in Fig. 4 A. Interestingly, the upregulation in expression of 
several genes in DCIS occurred mostly, or exclusively, in non-epithelial cells. Specifically, 
CTGF (Connective Tissue Growth Factor) and RGS5 (Regulator of G protein Signaling) were 

10 highly expressed in DCIS myoepithelial cells and stromal fibroblasts; in certain tumors 

expression was upregulated in DCIS epithelial cells as well (Fig. 4A). Cumulative scores for in 
situ hybridization were used for hierarchical clustering analysis and statistical tests. A 
dendogram of the 18 different tumors and 5 normal breast tissues showed that, using the 
expression of 14 genes, it was possible to distinguish between normal and cancer samples and 

15 group the tumors into subclasses (Fig. 4B). Although a clustering analysis of gene expression 
profiles obtained by in situ hybridization in DCIS of different grades contained some 
inconsistent associations, there was an indication that, as shown by the clustering analysis of 
DCIS tumors using SAGE data, DCIS tumors of a particular grade were more similar to each 
other with respect to the expression of the 14 genes than they were to DCIS tumors of a different 

20 grade (data not shown). The expression of no single gene was found to distinguish between 
DCIS and invasive tumors; this finding confirmed the results of the SAGE analysis described 
above. Surprisingly, in the majority of cases, the in situ and invasive areas within particular 
tumors did not always show the highest similarity to each other (Fig. 4B). This result is 
consistent with the idea that gene expression profiles are not the same during tumor progression. . 

25 Fisher's exact test revealed significant positive correlation between the expression of 

TFF3 and IFI-6-16 (p=0.01), LOC5 1235 and BEX1 (p=0.05), while inverse correlation was 
found between the expression of S100A7 and RGSSTu (p=0.04), S100A7 and TFF3 (p=0.04), 
arid CTGF arid TM4SF1 (p=0.01). No statistically significant associations were found between 
the expression of any of these genes arid histo-pathologic features of the tumors. 
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Example 5. Immunohistochemical Analysis of Gene Tissue 

Microarrays and Clinicopathologic Associations 
The expression of 10 genes was analyzed by immunohistochemistry using tissue 
microarrays composed of tumors of different pathologic stages. In total, 788 tumor samples (675 
primary invasive tumors, 33 metastases, 71 pure DCIS, and 9 DCIS with concurrent invasive 
carcinoma) obtained from eight different cohorts (tissue microarrays) were analyzed. Expression 
of all 10 genes was not analyzed in all cohorts. An example of immunohistochemical staining of 
a DCIS with antibodies specific for 5 gene products is depicted in Fig. 4C. 

Cumulative scores for immunohistochemical staining were used for statistical analyses to 
determine associations between the expression of the genes and histo-patho logic features of the 
tumors or between different genes. In addition, S100A7 expression was analyzed with respect to 
clinical outcome (overall survival and distant metastasis free survival) in two of the patient 
cohorts. 

As shown by the above-described SAGE analyses, the expression of EBC-1 was almost 
exclusively limited to a subset of invasive breast carcinomas, with only 2 out of 80 DCIS tumors 
showihg detectable IBC-1 expression (Fig. 4C and data not shown). The expression of CTGF, 
TFF3, and SPARC in the stroma was statistically significantly related to pathologic stage with 
TFF3 and SPARC being less likely to be expressed in DCIS than in invasive or metastatic 
tumors (Table 6). Statistically significant association between S100A7 expression and estrogen 
receptor (ER) negativity, high histologic grade, and more than 4 positive lymph nodes was 
demonstrated in logistic regression models in primary invasive tumors (Table 6). Since all these 
tumor characteristics are known to correlate with poor prognosis, it is likely that SI 00 A7 
expression identifies a clinically meaningful subgroup of tumors. Kaplan-Meier analysis 
demonstrated decreased overall survival for patients with SI 007 A7 positive tumors, but this did 
not reach statistical significance (p=0.41), possibly due to relatively short patient follow-up data 
and insufficient sample size (data not shown). The expression of fatty acid synthase (FASN) was 
higher in ER negative and HER2 positive high-grade tumors, while the expression of SPARC 
(osteonectin), inversely correlated with high histologic grade and TNM stage 3 (Table 6). The 
fraction of breast tumors that expressed the cytokines CXCL1 (GROl), CXCL2 (GR02), and IL- 
8 was, as expected, very low, since the genes . encoding them were more highly expressed in . 
normal mammary epithelium than in breast cancer assessed by SAGE and 
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immunohistochemistry (data not shown). Finally, using Fisher's exact test the expression of 
S100A7 was associated with a higher likelihood of expression of FASN (p=9.95xl0 6 ) and TFF3 
(p=0.002), and a lower likelihood of expression of CTGF (p=0.005), while the expression of 
FASN was. associated with that of TFF3 (p=3.5x!0^) and SPARC in the tumor cells (p=4xl0^). 



76 



WO 2004/085621 



PCT/US2004/008866 



> 



2 
o 



.52 
"5b 
o 

i 

-+— » 

CO 

a 

i 
I 

CO 

I 

8 

s 

8 



CO 

IS 



CO 

U 
P 



s 

8 

Al 



to 

CO. 



CO 

■2 
O 



2 



2 



es 

I 

=8: 
.52 

s 

"I 

03 
> 



CO 

u 
ft 



oo 



CO CO CO CO 

Z £ as 35 



CO CO CO CO 

35 S5 2: 35 



O 2? F£ 

| 35 35 35 
ex 



ro 

!Z ii Z Z 

* - 



CN 
O 

00 O CO CO 

» o J5 35 



g (N (N 



w . O O 

lit 



co . . 

2 CO • CO co! 



co 

O <N § S 

©.<? °. O 
o 



CO 



CO 

a: 



CN 
O 



1 



o 
o 

L 



CO 

as 



CO 

35 



CO 



55 * 



o vo 



CO 

• 35. 



o 



o 
o 



VO r-H . ^ 

CN CO 



vo 



w w O CO, 
vj vo vo 

S a -2 S 



«n ON CN O 

t-* oo cN O 

s oo VO H 

CN CN CO CN 



CN 



O 



VO 
CO 



ON 
CO. 



CO 



t 



g 



CO 

3S 



CO 



CO 

35 



CO 

35 



s 

o 



9 



CN 

r— I 

oo 

CN 



oo 

CO 
VO 



CO 

35 



CO. 

2 



CO 

35 



CO 



CO 

2: 



CO 

35 



! 5T 



co co 23 co co 
55 £ 35 35 35 



CO CO CO CO CO 

35 35 55 35 55 



CO CO CO CO CO 

£ £ 3S 35 55 



CO CO CO CO CO 

£ ■ z 35 35 55 



CO CO CO CO CO 

2 as 2 z z 



CO CO CO CO CO 

as as as as 35 



CO 

as. 



:3S 



CO CO CO CO 

as as 35 z 



"4 s 

S5 



ON 

vo 



^.ON 



— • CN 



vo vo 



oo 

CN 




co" 



5 

'8. 
9> 



6p £ 



M « =3 



z o 



77 



WO 2004/085621 



PCT/US2004/008866 



Example 6. Analysis of SAGE li braries from epithelial and non-ep it helial cells of normal breast 

and DCIS tissue . 

The SAGE, analyses described above indicated that, in breast cancer, dramatic changes 
occur riot only in the cancerous epithelial cells, but also in various stromal cells. Surprisingly ail 
these stromal changes were already present in pre-invasive tumors such as DCIS (ductal 
carcinoma in situ) that have not yet invaded the surrounding tissues. Interestingly, many of the 
genes up-regulated in tumor epithelial or stromal cells encode secreted proteins (Connective 
Tissue Growth Factor, Trefoil Factor 3, Osteonectin, IGFBP-7 etc.) implicating autocrine and/or 
paracrine regulatory loops among epithelial and stromal cells. Based on these results it was 
concluded that a comprehensive analysis of the gene expression profile of each cell type found in 
normal breast tissue and DCIS tissue, combined with the analysis of the genetic changes present 
in these cells would yield important new information on the role of epithelial-stromal 
interactions in breast tumorigenesis and will help define the cell type of origin of breast 
carcinomas. In addition, genes and pathways identified by such an approach will likely represent 
excellent candidate therapeutic targets. 

Analysis of SAGE libraries from epithelial and non-epithelial cells from normal breast 
tissue and DCIS tumors identified 35 tags that are significantly (p ^0.002) differentially 
expressed between leukocytes (Table 7), 333 tags that are significantly (psO.002) differentially 
expressed between myoepithelial cells (Table 8), 146 tags that are significantly (p^ 0.002) 
differentially expressed between luminal epithelial cells (Table 9), and 1 75 tags that are 
significantly (ps0:002) differentially expressed between endothelial cells (Table 10) isolated 
from normal and two different DCIS tissue. In Tables 7-10, data obtained with normal breast 
tissue (NL) and one DCIS sample (Table 10: D6) or two DCIS samples (Tables 7-9: D6 and D7) 
are shown. The numbers of tags shown are normalized values (see Example 1). The ratio of the 
number of tags obtained from cells isolated from DCIS tissue to the number obtained with cells 
from normal breast tissue (d/n, d6/n, or d7/n) for each tag are shown. The tables also include the 
Unigene numbers and the names of previously identified genes. Where no Unigene number is 
shown, the relevant gene has not previously been identified. 

Analysis of the SAGE data confirmed the findings of the RT-PCR analysis (see Example 
1 and Figure 2) that the cell purification procedure worked well in that certain genes known to be 
expressed in the cell types of interest were represented in the relevant SAGE libraries. For 
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example, the leukocyte libraries had the highest level of expression of several immunoglobulin 
and certain iriterleukins, while the levels of IGFBP-7 and hevin, and selectin E (endothelial cell 
adhesion molecule) were highest in the endothelial cell SAGE libraries. Interestingly, keratin 7 
and 17 were highly abundant in the normal, but significantly decreased in the DCIS 
myoepithelial libraries suggesting that maintaining the normal differentiation state of 
myoepithelial cells may require the presence of normal luminal mammary epithelial cells. In 
many of the genes, there was at least a 10-fold difference in expression between normal and one 
or both DCIS tissues tested; in Tables 7-10 the relevant genes are indicated by the symbol "d" at 
the end of the relevant tag sequence. Furthermore, at least among differentially expressed genes 
that were previously known, 44 in the endothelial, 1 1 in the leukocyte, 82 in the myoepithelial, 
and 29 in the luminal epithelial cells encode proteins that are either secreted or expressed on the 
cell surface and thus likely to be involved in epithelial-stromal cell interactions that regulate (up 
or down) tumor development and/or progression; Tables 1 1, 12, 13, and 14 list the relevant genes 
in leukocytes, myoepithelial cells, luminal epithelial ceils, and endothelial cells, respectively. 
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d normal breast tissue | 




Gene 
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|cysteine-rich protein 1 (intestinal) j 
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|major histocompatibility complex, class D, DP beta 1 " J 


[vimentin ■■ [ 


[plasminogen activator, urokinase receptor " 


| Lysosomai-associated multispanning membrane protein-5, haematopoetic cell specific ~~! 
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[Table 8. Genes differentially expressed in myoepithelial cells from DCIS and normal breast tissue | 
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i epithelial cells from DCIS and normal breast tissue I 






|ribbsomal protein S24, reliable 3* end j 


|cofilinl (non-muscle), internal tag | 


eukaryotic translation elongation factor 1 alpha 1, internal tag 


§ 

n -o 

1 « 

2 co 

13 

13 

3 | 
II 

11 

3 o 

* 4 


| Activating transcription factor 3, reliable 3 f end • | 


Human Tisl 1 d gene, reliable 3' end 


(nucleophosmin (nucleolar phosphoprotein B23, numatrin), reliable 3' end | 


I Cation-chloride cotransporter-interacting protein, reliable 3' end . | 


Homo sapiens, clone IMAGE:3840457, mRNA, reliable 3' end 


inlerleukin 8, reliable 3 1 end j 


Splicing factor, arginine/serine-rich 10 (transformer 2 homolog, Drosophila), reliable 3' end 


acyl-Coenzyme A dehydrogenase, very long chain, reliable 3' end | 


Glutamic-oxaloacetic transaminase 1, soluble (aspartate aminotransferase IV reiable 3' end 


Ribosomal protein S3, reliable 3' end I 


GR02 oncogene, reliable 3* end . | 


ribosomal protein LI 7, reliable 3 1 end 


1 1 
3 i 

1 3 

I 1 
II 
II 


SH3-domain GRB2-like 1, reliable 3' end 1 


Homo sapiens mRNA; cDNA DKFZp547C162 (from clone DKFZp547Cl 62), reliable 3' end 

Ribosomal protein L30, reliable 3* end j 


ribosomal protein LI 5, shorter alternative transcript 
ribosomal protein SI 9, reliable 3* end 


5 1 

1 • 

8 I 

i * ' 

Ij 

1 3 « 

3 1 

1 1 

b § : 

3. °« 

3 s ; 

2 vo 5 

5 ^ : 
2 w : 

£3 co 5 


ro 
o 

X 

a 

1 

f 

§• 1 

i ~ m 

5 u 

5* ?! 
S -i 
1 2 

^ .a 

1 s 
3 g 

2 1 


ribosomal protein L32, reliable 3' end , | 


HIA-C MajV histocompatibility complex; class I, C, reliable 3' end I 


•1 

i 




Unigene 


\ 180450 


180370 


181165 


182825 


o 

1 


U07802 


9614 


[119178 


BC012990 


s 


30035 


82208 • 




252259 


75765 


82202 


| ILL 1 

91379 


97616 | 


376798 
334807 


74267 
298262 


77028 

cionn 


356795 


169793 


277477 


X93334 j 






T 


» 




r <? 


T 


i 


T 


• 


<? 




"? 


°r 


7 


T 




CS w 


? V 


<? 


1* «* 


CO CO c 




? 


•? 


cs 


o 




| 


«? 


«? 


cn e 


? T 


T 


t 


T 




T 


T 




T 


1* 


T 




t * 


r 7 




.7 V 


V •? " 






V 


V 


"? 


| Table 9. Genes differentially expressed in Iumim 




D7 


o 
cs 






* on 


00 


vo 


cs 


r> 


9 




VO 


oo 


o 
cs 


CO 
CO 




*r c 


* CO 

< cs 




oo as 




J co r 
«- 


i IO 
H OO 


vo 

ON 


vo 
«n 


oo 




S 


cs 




— « f 


o vo 
- t> 


o 

r-4 


o 


a 


ON 


o 


* 




CO 

ro 


cs 


«o 

CO 


ON 
CS 


VO 

cs c 


r vo 

4 CS 


ON 


cs 


CS CS f- 

so o 


; s 2 


> c- 

« 00 






«o 




.-J 


5 


vo 
«n 


VO C 
CO v 
c 


D 00 
* CS 


vo 


VO 
CO 


00 


CO 

cp 


00 


00 

vo 

CO 


a 


cs 

CO 


S3 






S £ 


) 2 


OO 
CO 


s g 


00 ON C 
^ ON C 

cs — 


* -*r r- 
«n vc 


• «o 


o 

9 


COT 


oo 
c> 




% 


Igcctotatga 


| CTGCCAACTT 


CAAGTTTGCT d • 
rinnrvrnrinnrr 


CGCCGCCGGC 


[GTAAAAAAAA 


TAGAAAGGCA 


1 TGAAATAAAA 


(TGAAAAAAAA 


ACTCCAAAAA 


TGGAAGCACTd 


GATGAACTGA 


GCCGCCCTGC 


AGAAAAAAAA 


CCCCAGCCAG 


TTGAAGCTTTd 


AGCTCTCCCT 

CAAAAAAAAA 


CCCATCCGAA 


AGGGGCGCAG . 


GTCTGCACCT 
CCAGAACAGA 


GTGTTAACCA 
CTGGGTTAAT 
GTCTTAAAGT d 


AGAGAAATTT 
CTTCGAAACT 


TTGGTCCTCT 


X1XX0DVD9I 


GTGCGCTGAG 


GGGAAGCAGA | 




d 

55 

a 
I 


1 1539 


| . 1540 


1541 


1543 


1 1544 


1 1545 


1546 


1 1547 


. 1548 


1549 


1550 


1551 


1552 


1553 


1554- 


1555 
. 1556 


1557. 


oo 


1559 
1560 


.1561 
1562 . 
1563 


1564 

1565 


1566 • 


ID 


oo < 

CO < 

m i 


3> 
c— 



94 



WO 2004/085621 



PCTYUS2004/008866 




95 



WO 2004/085621 



PCT/US2004/008866 




96 



WO 2004/085621 



PCT/US2004/008866 



and normal breast tissue 




» 


1 Mannosidase, beta A, lysosomal-like, reliable 3' end 


CD8 1 antigen (target of antiproliferative antibody I), reliable 3' end 


| hypometicalprotekFlJ22833, internally primed site j 


Prolactin-induced protein, internal tag 


jstanniocalcin 2, reliable 3' end 


| leukemia inhibitory factor (cholinergic differentiation factor), internal tag | 


DERMOl Likely ortholog of mouse and rat twist-related bHLH protein Dermo-1, reliable 3' end • 


1 

CO 

a: 

« 

*i 

8" 

• ij 
1 

E 

E 

i 

?! 

0 w 
a> <J 

1 £ 

2 j 

I s 

j 

o o 

! | 

^ 1 


te9la02.xl NOCGAP_Pr28 Homo sapiens cDNA clone IMAGE:2094026 3', mRNA sequence, 
undefined 3' end 


protease inhibitor 3, skin-derived (SKALP), reliable 3' end 


a 

CO 

¥ 

••=1 

e 

a cs 

0 .o 

<u c 

1 § 

1 1 
- -s 

11 
1 

I | 

5 O 

A CO 


superoxide dismutase 2, mitochondrial, reliable 3' end I 


KIAA0781 protein, undefined 3' end 


1 

CO 

ja 
j 

1 

.if 
■ ~o 

| 

. i 

0 © 

8 2 
S £ 

1 -1 
1 1 

[s 

1 1 
3 1 

3 r 

i 

1! 


aldo-keto reductase family 1, member C2 (dihydrodiol dehydrogenase 2; bile acid binding protein; 3-alpha 
hydroxysteroid dehydrogenase, type III), reliable 3' end 


O 

S 

■a 

s 

§1 

00 «i 
W I 

s : 

1 i 
I i 

li 

If 5 

ll 

si 

j ^ 

II 

§ i 

ON <] 


Ts translation elongation factor, mitochondrial, reliable 3' end j 


human immunodeficiency virus type I enhancer binding protein 2. reliable 3' end I 


c* 

i 

I 

! 

4 

I 

— M 

1 

< 

■ i 

a 
M 

co a 

O cd 

3 .S 

to u 

3 1 

If 

4 '1 

PQ « 


end . 
aquaporin 3, reliable 3' end 


2a6 lg08j-l Soares fetal liver spleen INFLS Homo sapiens cDNA clone IMAGE:297086 5' similar to 
gb^C54486_mal PLASMA PROTEASE CI INHIBITOR PRECURSOR (HUMAN);, mRNA, undefined 
V end 


Small inducible cytokine subfamily B (Cys-X-Cys), member 6 (granulocyte chemotactic protein 2), reliable 


jerum amyloid Al , reliable 3' end J. 


wn DCIS 




| Unigene 


VO 
CN 

VO 


i 54457 


i| 118183 


99949 


1 155223 


(2250 


32366 


75498 


AI420761 j 


112341 

1 COO ft 


278431 


372783 


42676 


350470 


201967 


T69914 


340959 


75063 | 


69771 


297681 
234642 


W03794 : 


164021 


532053 j 


al cells fn 




S3 


00 
I 






o 


*n 




T C 






•2 S 


J <? 






- T 




r» 
«? 


«n 
i 


vo 

<? 




oo «o 

T 1 




vo 


? 


il epitheli 




<s 

vo 

-a 


tn 
1 


vo 

t— » 
I 


vo 

f 


vo 


r- 

1 


r- 


OO 0 

(^H •» 

1 


O 00 

H f— « 

t 1 


oo 
1 


OO c 
1 


r 


CN 
«? 


S ? 


-24 


$ 


cs 


ON 


o 

CO 


oo 

CO 


o\ o 

CO ^ 






o 
"9 


luniiru 




s 




OO 


o 


o 




o 
»— i 


~- r 


1 CN 


cs 


o c 


«* jn 


CO 




' ? 


«n 


© 


CS 


o 


vo 


CN OO 


o 


O 




.S 

OJ 

S 




8 




CN 


o 




T 




CN - 


■« CN 


o 


o c 


< cs 


o 


o — 


• CS 


o 


o 


o 




cs 


CS o 


o 




o 


s 




NL 


00 


00 
CO 


vo 


o 

CN 


CN 

vo 


vo 


^ C 




' oo 


OO V 

— « ^ 


r «o 


o 
cs 


CN C- 
CS CN 


vo 

OS 
CN 




5? 


Os 

cs 


vo 

CO 


CO 


VO O 
ON TJ- 




VO 

vo 


o 
vo 






















































[Jenes differential! 




1 TagSequence 


CTTCAAAAAAd 


CTAAAAAAAAd 


GGTGAGTTACd 


GTGGTTAAAAd. 


CCCGAGGCAGd 


i 

I 


GACAAAAAAAd 


GAGGGTTTAG d 


GCGCGATGCAd 


TTGAATCCCC d 


! 

i 8 

> H 

1 y 
1 © 
! o 

(5 


GCTTGCAAAAd 


GTGTGGCAGCd 
TTTTGTGTGAd 


CTGGCCCTCGd 

1 


pvoooraioov 


TCTCCAACAAd 


GGTAAAATTA d 


CTTAAAAAAAd 


GCAGGCCAAGd 


il 


CTTCTCCAAAd 


ITGGTTTTTGd 


STGCGGAGGAd | 


cv 
3 




1 SEQIDNO: 


| 1628 


.1629 


( 1630 


1631 


1632 


1633 


1634 

IBM 


1636 


1637 


co cr 
CO cc 
CD CC 


1640 


CD 


1642 

1643 


1644 


1645 


CD 


1647 


30 
X> ( 


J> < 
t i 
x> c 

r- . * 


IOOU 

1651 


1652 


1653 


^* 
o 



97 



WO 2004/085621 



PCT7US2004/008866 









03 

i 








CO 

2 








!/> 

00 

SI 

?! 






i: 


§ i 

2 3 
J § 

1 § 


and normal breast tissue 






1 § 

s •« 

It' 

s © 
i w ' 

j! s £ 

i!i ©■ *a 


im PCIS 




1 * 

3 i 

c 


vo 

J g 


1 

"2 




1 ' 


? ^ 


t epithelh 




N 

1 


CO 


lumina 




B 


3 O 


.S 

% 




c 

vo 
Q 


j o 


s 

t 






; 9 

CM 


renes differential! 




! 


ACACAGCAAGd 


co- 
ot 
3 




§ IT 

CO 


1 <o 
> vn 



98 



WO 2004/085621 



PCTYUS2004/008866 



from DCIS and normal breast tissue ~~1 




1 Gene . 


|Heme oxygenase (decycling) 1, reliable 3' end , ■ . | 


i 

co 

I . 

I : 
1 


heat shock lOkD protein 1 (chaperonin 10), reliable 3 f end 
Tripartite motifoontaining 32, internal tag 


I insulin-like growth factor binding protein 7, shorter alternative transcript a 


1 
1 

CO* 
CS 

oo 

CO 

1 

a 

I . 

J , 

1, 

?1- 

s -s c 

oo q ■ 

§ % 

II ' 
a 3 


s 

o 

D 

i 

! 

3 " 
3 co 

2 ^ 

3 H 

3 CO 


stanniocalcUi 1, reliable Tend | 


oo01c02.sl Soares_NFLJT_GBC__Sl Homo sapiens cDNA clone IMAGE:15648?8 3' similarto gb:X00737 
PURINE NUCLEOSIDE PHOSPHORYLASE (HUMAN);, mRNAsequence, reliable 3' end 


1 

CO 

.1 
.jS 

1 

oh 

CS 
d 

I 


interferon, gamma-inducible protein 1 6, reliable 3' end | 


Hypothetical protein FIJI 0350, reliable 3' end j 


hypothetical protein MGC8721, reliable 3' end 


i 1 

5> - 
CO 

•o j> 

it 

2 9 

3 co 

c 

a S 

J'l 
I I 

|l 


AV728954 HTC Homo sapiens cDNA clone HTCCGGl I 5'* mRNAsequence, internal tag 


1 
I 

E 
c 

h 

u * 

I 5 

3 « 
3 g 

g | 

3 1 

! 3 

f 1 

» o 

i 8 

IS 


small nuclear ribonucleoprotein polypeptides B and Bl, reliable 3' end I 


S 

CO 

. I 

§ 

. f 

| 

at 

ei 

i' 

E 

1 

S 

I 
5 

e 

S 

o 

«r» 
CS 
CO 

o 
o 

§ 


hypothetical protein FLJ20005, reliable 3' end 


major histocompatibility complex, class II, DR beta 5 


hypothetical protein MGC4677, reliable 3* end j 


collagen, type IV, alpha 1, reliable 3' end , | 


epitheUal membrane protein 1, reliable 3' end 


a disintegrin-like and metalioprotease (reprolysin type) with thrombospondin type 1 motif 4, reliable 3' end 


mitochondrial j 


Jielial cells 1 




( Uhigene 


. 202833 


18792 


1197 
236218 


[U9206 . 


BG223065 


120932 


| 25590 


oo 
oo 

9 


74561 


155530 


177596 


~* V 

& s 


278270 


AV728954 


CS 

- »o 


83753) 


AW805523 


184634 


352392 | 


337986] 


119129( 


00 ^ 

vo c 

CO P 

C\ n 

" § 


211604 


X93334|i 


in endof 




* 




CO 

cn 


CO CA 
CO CS. 


S 


vo c 
' CS c 


* CS 
«• CS 


cs 


cs 


cs 


cs 






r 






cs 








o 


o 

•—4 


2 c 






B 




8. 




m 

CO 


CO o\ 
CO CS 




vo c 

CS C 


«1 CS 
* CS 


cs 


cs 


cs 


cs 


o 


oo c* 

CO c 


* r» 

.CO 


P s 


h cs 

"> CO 


o 

CO 




Ss 




oo 
o 






1 00 
CO 


00 

vo 
cs 


& 

8 

>> 






o 


o 


o o 


o 


o c 


■> o 


o 


o 


o 


o 


CO 


co r» 


"> CO 


VO cr 


1 CO 


CO 


CO 


CO 


CO 


o 


vo 




*r% 

4 


vo 

CO 


Genes differentia 




I 

5 


POOOXOOOXOO 


TTTGAGGATTd. 


TAAATAATTTd 
GCAGAATAGAd 


[GATAACTACAd 


31 

p < 
8 d 


AAATTGTTGGd 


1 

O 


TGCCTCTGTCd 


TCTTGATTTAd 


GACGACTGACd 


CCCCCTGCCCd 


J < 

o < 


ACAGTGGGGAd 


O c. 

R r 

B t 
8 5 


CATTTCAGAGd 


GGATTGTCTGd 


TTAGTGTCGTd 


AGGAACTGTAd 


ACAGCGCTGAd 


GGCTGGTCTGd 


GACCGCAGGAd 


1 


rCTCTGAGCA 


ITTAACGGCC | 


(Table. 10 




| SEQIDNO: 


1657 


i 1658 


.1659 . 
1660 


I 1661 


. 1662 


• 1664 


1665 


1666 


1667 


1668 | 


1669 


1670 


1672 


1673 
1674 


. 1675 


S 1676 | 


1677 


1678 


1679. 


1680 • 


oo 

CO 


1682 ' 
• 1683. 


.1684 


| 1685 |' 



99 



WO 2004/085621 



PCT/US2004/008866 



in endothelial cells from DCIS and normal breast tissue 1 
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TGGAAAGTGA 
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GACCAGCAGA 


CTAAAATAGT 
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delial cells from DCIS and normal breast tissue i 
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Table 11. Genes from Table 7 encoding secreted and cell surface proteins 



Unigene 


Gene 


375570 


HLA-DRBl, major histocompatibility complex, class II, DR beta I 


• 126256 


interleukin 1, beta 


76807 


major histocompatibility complex, class tl, DR alpha 


73817 


small inducible cytokine A3 


169401 


apolipoprotein E 


79356 


Lysosomal-associated multispanning membrane protein-5, haematopoetic cell specific 


- .179657 


plasminogen activator, urokinase receptor 


17409 


cysteine-rich protein 1 (intestinal) 


74631 


basigin (OK blood group), leukocyte activation M6 antigen 


814 


major histocompatibility complex, class II, DP beta I 


352.107 


trefoil factor 3 (intestinal) 
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Table 12. . Genes from Table 8 encoding secreted or cell surface proteins 



TTnl atmo 
UlilgCIlC 


Gene 


119571 


Collagen, type III, alpha 1 (Ehlers-Danlos syndrome type IV, autosomal dominant, shorter alternative 
tr&nscrint 


172928 


collagen tvne T nlnha 1 Jntpmnllv nrtmpff citp 
wuiiagcii, vj^pc I, cupula i, HiLciTuuiy priincu site 
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thrombospondin 2, reliable 3' end 
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H factor (comp lement)-like 1 , reliable 3' end 
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collagen, type VI, alpha 2, reliable 3' end 


265827 


G1P3 interferon alpha-induoible protein, reliable 3'end, 97%, IFI-6-16, secreted based on PSORT 
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microfibriUar-associated protein, undefined 3' end 


274313 


insulin-like growth factor binding protein 6, reliable 3* end 


75736 


apolipoprotein D, reliable 3' end 
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1 1 son 

1 1 J7U 


wduicpsin r, rciiaoie j cuu 
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76 1 52 
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collagen, type I, alpha 1, shorter alternative transcript 
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transcript 


821 
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fubulin, transcript variant C, reliable 3* end 
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C1R Complement component 1, r subcomponent, reliable 3' end 
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HLA-C Major histocompatibility complex, class % G, reliable 3' end 
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Table 12, Genes from Table 8 encoding secreted or cell surface proteins 



Unigene 


Gene 
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collagen trinle helix reneat cohtaininc 1. reliable 3* end 
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^omnlement comnonent i3h/4b^ recentor 1. including Knon«j hi nod ptouo svstem. reliable 3' end 
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CD81 antigen (target of antiproliferative antibody 1), reliable 3* end 
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interleiiWin ^ f interferon hetA OA reliable 3* end 


101382 
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1U555J 


collagen, type VT, alpha 1, reliable 3 1 end 




apolipoprotein E, undefined 3* end 


22775 1 


lectin, galactoside-binding, soluble, 1 (galectin I), reliable 3' end 


296267 


follistatin-like I, reliable 3* end 


119178 


Cation-chloride cotransporter-interacting protein, reliable 3* end 


136348 


Osteoblast specific factor 2 (fasciclin I-like), undefined 3' end 


1U301 


Matrix metalloproteinase 2 (gelatinase A, 72kD gelatinase, 72kD type IV collagenase, reliable 3' end 


75415 


beta-2-microglobulin, reliable 3* end 
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Table 12. Genes from Table 8 encoding secreted or cell surface proteins 



Unigene 


Gene 


62954 


Ferritin, heavy polypeptide 1 , reliable 3' end 


287797 


integrin, beta I (fibronectin receptor, beta polypeptide, antigen CD29 includes MDF2, MSK12), reliable 3' end 


74471 


Gap junction protein, alpha 1, 43kD (connexin 43), reliable 3' end' 


8867 


cysteine-rich, angiogenic inducer, 61, reliable 3' end > 


87409 


thrombospondin 1, reliable 3' end 


23582 


tumor-associated calcium signal transducer 2, reliable 3* end 


624 


interleukin 8, reliable 3 f end 


82689 


tumor rejection antigen (gp96) 1, reliable 3* end 


1369 


Decay accelerating factor for complement (CD55, Cromer blood group system), reliable 3 1 end 


171921 


sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3C, reliable 3' end 


303649 


small inducible cytokine A2 (monocyte chemotactic protein 1), reliable 3* end 


77356 


transferrin receptor (p90, CD7 1), reliable 3* end 


9006 


VAMP (vesicle-associated membrane protein)-associated protein A (33kD), reliable 3' end 


6418 


seven transmembrane domain orphan receptor, reliable 3' end 


78614 


complement component 1, q subcomponent binding protein, reliable 3' end 


287797 


ITGBl Integrin, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 includes MDF2, MSK12), ' 
internally primed site 


75765 


GR02 oncogene, reliable 3' end 


78225 


annexin Al, reliable 3* end 


2820 


oxytocin receptor, reliable 3* end 


1 17938 


Collagen, type XVII, alpha 1, reliable 3* end 


289 114 


hexabrachion (tenascin C, cytotactin), reliable 3' end 


799 


diphtheria toxin receptor (heparin-binding epidermal growth factor-like growth factor), reliable 3' end 


2250 


leukemia inhibitory factor (cholinergic differentiation factor), reliable 3 1 end 


198689 


bullous pemphigoid antigen 1 (230/240kD), reliable 3' end 


8230 


a disintegrin-like and metalloprotease (reprolysintype) with thrombospondin type t motif, 1, reliable 3* end 



108 



WO 2004/085621 



PCT/US2004/008866 



Table 13. Genes from Table 9 encoding secreted or cell surface proteins 






Unigene 


Gene 


277477 


HLA-C Major histocompatibility complex, class I, C, reliable 3* end 


332053 


serum amyloid Al, reliable 3* end 


164021 


Small inducible cytokine subfamily B (Cys-X-Cys), member 6 (granulocyte chemotactic protein 2), 
reliable 3' end 


297681 


serine (or cysteine) proteinase inhibitor, clade A (alpha- 1 antiproteinase f antitrypsin), member 1, reliable 
y end 


69771 


B-factor, properdin, reliable 3 * end, complement factor 


350470 


Trefoil factor I (breast cancer, estrogen-inducible sequence expressed in), reliable 3 1 end 


112341 


protease inhibitor 3, skin-derived (SKALP), reliable 3 f end 


75498 


small inducible cytokine subfamily A(Cys-Cys), member 20, reliable 3 1 end 


2250 


leukemia inhibitory factor (cholinergic differentiation factor), internal tag 


155223 


starmiocalcin 2, reliable 3' end 


54457 


CD8 1 antigen (target of antiproliferative antibody 1), reliable 3' end 


234726 


serine (or cysteine) proteinase inhibitor, clade A (alpha- 1 antiproteinase, antitrypsin), member 3, reliable 
3' end 


62492 


HIN-1, secretoglobin, family 3 A, member 1, reliable 3' end - 


89690 


GR03 oncogene, reliable 3 r end 


204096 


secretoglobin, family ID, member 2, reliable 3' end 


278573 


CD59 antigen pi 8-20 (antigen identified by monoclonal antibodies 16.3A5, EJ16, EJ30, EL32 and 
G344), reliable 3'end, similarity to urokinase plasminogen activator receptor • 


621 


«« 

lectin, galactoside-binding, soluble, 3 (galectin 3), reliable 3- end • 


789 


GROl oncogene (melanoma growth stimulating activity, alpha), reliable 3' end 


93913 


interleukin 6 (interferon, beta 2), reliable 3* end - 


348419 


LOCI 18430 Small breast epithelial mucin, undefined 3' end 


75106 


clusterin (complement lysis inhibitor, SP-40,40, sulfated glycoprotein 2, testosterone-repressed prostate 
message 2, apolipoprotein J), reliable 3* end , ' 


277477 


HLA-C Major histocompatibility complex, class I, C, reliable 3'end, 97% 


75765 


GR02 oncogene, reliable 3* end 


624 


interleukin 8, reliable 3' end 


119178 


Cation-chloride cotransporter-interacting protein, reliable 3* end 


5372 


• 

ciaudin 4, reliable 3' end 


306226 


Transmembrane garnma-carboxyglutamio acid protein 4, reliable 3* end 


31439 


serine protease inhibitor, Kunitz type, 2, reliable 3' end 
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Table 13. Genes from Table 9 encoding secreted or cell surface proteins 






Unigene 


L Gene 


323910 


V-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene 
hornolog (avian), undefined 3' end 
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Table 14. Genes from Table 10 encoding secreted or cell surface proteins 






Unigene 


Gene 


119206 


insulin-like growth, factor binding protein 7, shorter alternative transcript 


16085 


putative Oprotein coupled receptor, reliable 3* end 


25590 


; , . , • , 
stanniocalcin I, reliable 3 end 


74561 


alpha-2-macroglobulin, reliable 3' end 


1516 


insulin-like growth factor binding protein 4, undefined 3' end 


352392 


major histocompatibility complex, class II, DR beta 5 


119129 


collagen, type IV, alpha 1, reliable 3' end 


79368 


epithelial membrane protein 1, reliable 3 1 end 


211604 


a dismtegrin-Uke and metalloprotease (reprolysin type) with thrombospondin type I motif, 4, reliable 3* end 


119206 


insulin-like growth factor binding protein 7, reliable 3' end 


1908 


proteoglycan 1, secretory granule, reliable 3' end 


.74471 


Gap junction protein, alpha 1, 43kD (connexin 43), reliable 3' end 


624 


interleukin 8, reliable 3' end 


89546 


selectin E (endothelial adhesion molecule I), reliable 3' end 


168383 


intercellular adhesion molecule I (CD54), human rhinovirus receptor, reliable 3 'end 


298275 


solute carrier family 38, member 2, reliable 3' end 


.78409 


collagen, type XVHI, alpha 1, shorter alternative transcript 


277477 


Major histocompatibility complex, class I, C, reliable 3* end 


75445 


SPARC-like I (mast9, hevin), reliable 3 r end 


U 1334 


Ferritin, light polypeptide, reliable 3* end 


351316 


Transmembrane 4 superfamily member I, reliable 3' end 


111779 


secreted protein, acidic* oysteine-rioh (osteonectin), reliable 3' end . 


75415 


beta-2-microglobulin, reliable 3* end 


181357 


laminin receptor 1 (67kD, ribosomal protein SA), reliable 3' end . 


172928 


collagen, type I, alpha 1 , internally primed site 


300697 


immunoglobulin heavy constant gamma 3 (G3m marker), reliable 3' end 


119571 


Collagen, type III, alpha 1 (Ehlers-Danlos syndrome type IV, autosomal dominant), shorter alternative transcript 


75111 


protease, serine, 1 1 (IGF binding), similar to IGFBP7, cleaves IGF 


. 75511 


connective tissue growth factor, undefined S'end, 79.6% 


193716 


Complement component (3b/4b) receptor 1, including Knops blood group system, reliable 3* end 


172928 


Collagen, type I, alpha 1, internal tag ** 


93557 


proenkephalin (NCBI only) 


158287 


syndecan 3 (N-syndecan) 


89137 


Low density Upoprotein-related protein 1 (alpha-2-maoroglobulia receptor), reliable 3' end 


83326 


matrix metalloproteinase 3 (stromelysin 1, progeiatinase), reliable 3' end 


108623 


Thrombospondin 1, reliable 3* end 


102171 


immunoglobulin superfamily containing leucine-rich repeat, reliable 3' end ' 


25640 


claudin 3, reliable 3' end 


252189 


Syndecan 4 (amphiglycan, ryudocan), undefined 3' end ■ 


286124 


CD24 antigen (small cell lung carcinoma cluster 4 antigen), reliable 3' end 


BG939135 


cn30g0Zxl Normal Human Trabecular Bone Cells Homo sapiens cDNA clone NHTBC_cn30g02 random, mRNA sequence, undefined 3' end 


172928 


collagen, type I, alpha 1, internal tag 


23582 


tumor-associated calcium signal transducer 2, reliable 3' end 


5372 


Claudin 4* reliable 3* end 
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Example 7. Analysis of SAGE lib raries from epithelial cells and non-epithelial cells of normal 

breast tissue and breast tissues from patients with 
various diseases of the breast 
SAGE analyses were performed on ceil types in addition to those described in Example 6 
- and on breast tissue from patients with a variety of breast conditions. The data described in " 
Example 6 and additional data were analyzed in a manner different to that described in Example 
6. 

To determine the molecular profile of various cell types that are found in normal and 
diseased breast tissue (e!g., cancerous epithelial and non-cancerous stromal cells within a breast 
tumor) and to identify autocrine and.paracrine interactions that may play a role in breast tumor 
progression, a purification procedure (similar to that described in Example i for the analysis 
described in Example 6) was developed that allows the isolation of pure cell populations from 
normal breast tissue, in situ (DCIS; ductal carcinoma in situ) and invasive breast carcinomas 
(Fig. 5 A). Cell type-specific surface markers and magnetic beads were used for the rapid 
sequential isolation of the various cell types. The BerEP4 antigen that is restricted to epithelial 
cells, the CD45 pan-leukocyte marker, and the P1H12 antibody that specifically recognizes 
endothelial cells were exploited for this purpose. The CD10 antigen is present in myoepithelial 
cells arid, myofibroblasts but also in some leukocytes. Thus, to minimize the cross contamination 
of these different cell types, in the case of normal and DCIS breast tissue, myoepithelial cells 
were isolated from organoids (breast ducts). On the other hand, in invasive tumors, leukocytes 
were removed prior to capturing the myofibroblasts using the CD10 beads. There is no antibody 
is available that specifically recognizes fibroblasts and thereby facilitates their purification. . 
Thus, the unbound fraction, following removal of all other cell types, was used as a fibroblast- 
enriched "stroma" fraction. 

This cell purification protocol includes enzymatic digestion of the tissue and the 
possibility that the expression of some genes could be altered due to the procedure cannot be 
excluded. However, in that it was possible to verify the SAGE data by alternative methods 
using unprocessed tissue (see below), any such hypothetical changes are likely to be minimal. 
The success of the purification method and the purity of each cell fraction were confirmed by 
performing RT-PCR on a small fraction of the isolated cells using cell type-specific genes as was 
done for the cell fractions described in Example 6 (see Example 1). The remaining portion of the 
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cells (-10,000-100,000 cells depending on the sample) was used for the generation of micro- 
SAGE libraries following previously described protocols and for the isolation of genomic DNA 
to be used for array-Comparative Genomic Hybridization (aCGH) and Single Nucleotide 
Polymorphism (SNP) array studies [Porter et al. (2003a) Mot Cancer Res. 1:362-375; Porter et 

5 " at (2001)].' . " 

SAGE libraries were generated using a modified micro-SAGE protocol and the I-SAGE 
or long I-SAGE kits from Invitrogen (Carlsbad, CA). Approximately 50,000 tags (mean average 
tag number 56,647±4,383) were obtained from each library, and the preliminary analysis of the 
SAGE data was performed essentially as described [Porter et al (2001)]. Briefly, genes 
10 significantly (p<0.002) differentially expressed between normal and cancerous cells were 

identified by performing pair-wise comparisons using the SAGE2000 software that includes the 
software to perform Monte Carlo analysis (obtained from Johns Hopkins University, Baltimore, 
MD). 

SAGE libraries were generated from epithelial cells, and myoepithelial cells (and 

15 myofibroblasts from invasive tumors), infiltrating leukocytes, endothelial cells, and fibroblasts 
("stroma") from one normal breast reduction tissue, two different DCIS, and three invasive 
breast tumors. Not all libraries were generated from all cases due to the inability to obtain 
sufficient amounts of purified cells. In addition, a fibroadenoma and a phyllodes tumor were 
included in the SAGE analysis. Fibroadenomas are the most common benign breast tumors and 

20 are not considered to progress to malignancy despite genetic changes detected in the stromal (but 
not epithelial) cells [Amiel et al. (2003) Cancer Genet Cytogenet. 142:145-148]. Phyllodes 
tumors, on the other hand, are rare fibroepithelial tumors that are usually benign but can recur 
and progress to malignant sarcomas. Phyllodes tumors were initially considered stromal 
neoplasms but recent molecular studies demonstrating frequently discordant genetic alterations 

25 in both epithelial and stromal cells suggest that phyllodes tumors may represent a true clonal co- 
evolution of malignant epithelial and stromal cells [Sawyer et al. (2000) Am. J. Pathol. 
. 156:1093-1098; Sawyer et at (2002) J. Pathol. 196: 437-444]. Analysis of the SAGE data 
confirmed that the cell purification procedure worked well in that several genes known to be 
specific for a particular cell type were present in the appropriate SAGE libraries. For example 

30 cytokeratins 8 and 19, E-cadherin, HDM-l, CD24 were highly specific for epithelial cells, 

myofibroblast and myoepithelial cells demonstrated high levels of smpoth muscle actin, various 
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extracellular matrix proteins including collagens, and matrix metalloproteinases, while leukocyte 
libraries had the highest levels of several chemokines and lysozyme. 

Based on statistical methods developed (by bioinformaticians in the Department of 
Research Computing at the Dana-Farber Cancer Institute and the Department of Biostatistics at 
the Harvard School of Public Health) for the analysis of SAGE data, genes that are specifically 
expressed in a particular cell type and tumor progression stage were identified. Genes were 
defined as specific for a particular cell type if the average tag number in all the SAGE libraries 
generated from the selected cell. type was statistically significantly (PO.02) different from that 
of all other cell types. Using these criteria, 357 tags were identified as discriminating epithelial 
cells from other cell types, 572 tags were identified as discriminating myoepithelial cells and 
myofibroblasts from all other cell types, 502 tags were identified as discriminating leukocytes 
from all other cell types, 124 tags were identified as discriminating endothelial cells from all 
other cell types, and 604 tags were identified as discriminating "stromal" cells depleted of all the 
above-listed cell types (i.e., mostly fibroblasts) from all other cell types. 

To further define SAGE tags specific for each cell type, within each group of tags, those 
that were not only statistically significantly different, but also more abundant in the specific cell 
type, were selected. This led to the identification of 70 tags that were most abundant in epithelial 
cells, 117 tags present at highest levels in myoepithelial cells and myofibroblasts, 70 tags highly 
expressed in leukocytes, 117 tags in stroma, and 78 endothelium-specific tags. Several of these 
genes have previously been described as being specific for a particular cell type, e.g., keratins 8 
and 19 for epithelial cells, keratins 14 and 17 for myoepithelial cells, and chemokines and 
chemokine receptors for leukocytes [Page et ai. (1999) Proc. Natl. Acad. Sci. USA 96:12589- 
12594]. However, the cell type-specific expression of the majority of the genes has not been 
previously documented. The majority of the transcripts corresponding to these cell-type specific 
SAGE tags encode known genes but a significant fraction either are uncharacterized ESTs or 
currently have no cDNA match (-10% of the tags on average belong to each of these latter 
groups). In stroma 25/1 17 tags (21%) had no database match suggesting that they correspond to 
previously unidentified transcripts. 

Next, using the 471 SAGE tags most abundantly expressed or 63 of the SAGE tags most 
highly specifically present in each of the five cell types, a clustering analysis of all 27 SAGE, 
libraries using a new Poisson model based K-means algorithm (PK algorithm) was performed in 
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order to delineate similarities and differences among the samples. In addition, a clustering 
analysis of the SAGE libraries using each of the cell type specific genes was performed. The PK 
clustering method orders the samples according to their relatedness. For example, using the 63 
most highly cell type specific SAGE tags, a division of the 27- SAGE libraries according to cell 
types was obtained and, within each cell type sub-group, the DCIS samples are located between 
normal breast tissue and invasive breast cancer SAGE libraries. These results confirmed that, 
not only tumor epithelial cells, but also other cell types in the tumor are different from their 
corresponding normal counterparts. Since these differences are already pronounced at a pre- 
invasive (DGIS) tumor stage, they suggest a role for stromal changes not only in tumor invasion 
and metastasis, but also in the earlier steps of breast tumorigenesis. 

The most consistent and dramatic gene expression changes were found to occur in 
myoepithelial cells. Over 300 genes were differentially expressed at p<0.002 in both DCIS 
myoepithelial libraries. Interestingly, a significant fraction (89 out of 245 known genes) of these 
genes encode secreted or cell surface proteins, suggesting extensive abnormal paracrine 
interactions between myoepithelial and other cell types. Myoepithelial cells are thought to be 
derived from bi-potential stem cells that also give rise to luminal epithelial cells, although 
recently another progenitor has also been identified that can differentiate only to myoepithelial 
cells [Bocker et al. (2002) Lab. Invest. 82:737-746; Dontue et al. (2003) Genes Dev. 17:1253- 
1270]. The function of myoepithelial cells and their role in breast cancer is not well understood. 
However, myoepithelial cells have been shown to be able to suppress breast cancer cell growth, 
invasion, and angiogenesis [Deugnier et al. (2002) Breast Cancer Res. 4:224-230; Sternlichf and 
Barsky (1997) Clin. Cancer Res. 3:1949-1958]. The main distinguishing feature between in situ 
and invasive carcinomas, which is also used as a diagnostic criterion, is that: (a) in DCIS the 
cancer epithelial cells are separated from the stroma by a nearly continuous layer of 
myoepithelial cells and basement membrane; while (b) in invasive and metastatic tumors cancer 
cells are admixed with stroma. 

In Table 15 are shown the most highly cell type-specific SAGE tags and corresponding 
genes. Columns 1-27 in Table .15 show data obtained from 27 separate libraries generated from 
cells from a variety of samples. These samples were: 
Columns 1 -7 ( myoepithelial cells and myofibroblasts) :- 
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Column 1: myoepithelial cells isolated from normal breast tissue adjacent to invasive ductal 
carcinoma (EDC7) tissue. 

Column 2: myoepithelial cells isolated from reduction mammoplasty normal breast tissue 

(RM1). _ ... _______ _ 

Column 3: myofibroblasts isolated from an invasive ductal carcinoma (IDC7). 
Column 4: myofibroblasts isolated from an invasive ductal carcinoma (EDC8). 
Column 5: myofibroblasts Isolated from an invasive ductal carcinoma (IDC9). 
Column 6: myoepithelial cells isolated from DCIS tissue (D7). 
Column 7: myoepithelial cells isolated from DCIS tissue (D6). 
Columns 8-10 and 26 (fibroblast-enriched cells): 

Column 8: fibroblast-enriched cells from an invasive ductal carcinoma (JDC7). 
Column 9: fibroblast-enriched cells from DCIS tissue (D6). 

ColumnlO: fibroblast-enriched cells from reduction mammoplasty normal breast tissue (RM2). 
Column 26: fibroblast-enriched cells from a phyllodes tumor. 
Columns 11-12 (endothelial cells): 

Column il: endothelial cells isolated from reduction mammoplasty normal breast tissue (RM2). 
Column 12: endothelial cells isolated from DCIS tissue (D6). 
Columns 13-16 (leukocytes): 

Column 13: leukocytes isolated from DCIS tissue (D7). 

Column 14: leukocytes isolated from DCIS tissue (D6). 

Column 15: leukocytes isolated from an invasive ductal carcinoma (IDC7). 

Column 16: leukocytes isolated from reduction mammoplasty normal breast tissue (RM2). 

Columns 17-25 (epithelial cells: luminal type): 

Column 17: epithelial cells isolated from an invasive ductal carcinoma (IDC7). 
Column 18: epithelial cells isolated from an invasive ductal carcinoma (IDC8). 
Column 19: epithelial cells isolated from an invasive ductal carcinoma (IDC9). 
Column 20: epithelial cells isolated from DCIS tissue (D7). 
Column 21 : epithelial cells isolated from DCIS tissue (D6). - 

Column 22: epithelial cells isolated from normal breast tissue adjacent to DCIS (D2) tissue. 
Column 23: epithelial cells isolated from reduction mammoplasty normal breast tissue (RM3). 
Column 24: epithelial cells isolated from DCIS tissue (D2). 
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Column 25: epithelial cells isolated from DCIS tissue (D3). 
Column 27: funseparated cells of a juvenile fibroadenoma) - 

Rows 1-72 in Table 15 show SAG tags detected in the various libraries depicted in 
5 columns 1-27. 

Rows 1-27: SAGE tags that were statistically significantly (p < 0.02) more abundantly expressed 
in epithelial cells than in all other cell types. 

Rows 28-53: SAGE tags that were statistically significantly (p < 0.02) more abundantly 
expressed in.myoepithelial cells than in all other cell types or in myofibroblasts than in all other 
10 cell types. 

Rows 54-58: SAGE tags that were statistically significantly (p < 0.02) more abundantly 
expressed in leukocytes than in all other cell types. 

Rows 59-65: SAGE tags that were statistically significantly (p < 0.02) more abundantly 
expressed in fibroblast-enriched ceils than in all other cell types. 

15 Rows 66-72: SAGE tags that were statistically significantly (p < 0.02) more abundantly 
expressed in endothelial cells than in all other cell types. 

From Table 15 it can readily be determined, by referring to the intersection of relevant 
columns and rows, which of the listed genes are differently expressed (more highly or at a lower 
level) in the various cell types from DCIS and/or invasive breast cancers compared to 

20 corresponding cell types from normal tissue. Analogous differences in expression between cells 
from DCIS and from invasive breast carcinomas can similarly be discerned from the data in 
Table 15. It is noted that myofibroblasts are cells found only in cancer tissue and thus 
comparisons of gene expression involving myofibroblasts will be between: (a) myofibroblasts in 
DCIS and invasive breast carcinomas; or (b) between myofibroblasts in DCIS or invasive breast 

25 carcinomas and any other cell type (e.g., myoepithelial cells or fibroblasts) from normal breast 
tissue. 

Follow up studies were focused on myoepithelial cells, with special emphasis on secreted 
proteins and receptors abnormally expressed in these ceils. Several proteases [e.g., cathepsins F, 
K, and L, MMP2 (matrix metalloproteinase 2), and PRSS1 1 (protease serine (insulin-like growth 
30 factor-binding)], protease inhibitors [thrombospondin 2, SERPING1 (serine (or cysteine) 

proteinase inhibitor, clade G (CI inhibitor) member 1), cystatin C, and TIMP3 (tissue inhibitor 
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of metalloproteinase 3)], and many different collagens were highly up-regulated in DCIS ■ 
myoepithelial cells, suggesting a role for these cells in extracellular matrix remodeling (Table 
16). 

In Table 16, the column labeled "N-MYOEP-1" shows data obtained from a SAGE 
library generated from myoepithelial cells isolated from reduction mammoplasty normal breast 
tissue (RM1). The columns labeled "D-MYOEP-7" and "D-MYOEP-6" show data obtained 
from a SAGE library generated from myoepithelial cells isolated from two DCIS tissue samples 
(D7 and D6, respectively). The column labeled "Ratio D/N" shows the ratio of the average of 
the numbers of SAGE tags obtained with the two DCIS tissue. samples to the SAGE tag number 
obtained with normal breast tissue. 

Array-Comparative Genomic Hybridization (aCGH) and Single Nucleotide 
Polymorphism (SNP) array studies indicated that the changes in gene expression in non-cancer 
cells present in breast tumor tissue detected by the analysis described in Example 6 and this 
Example were not due to chromosomal gains or losses, e.g., loss of heterozygosity. 
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Example 8. Evaluation of gene expression bv immunohistochemistrv and rriRNA in situ 

hybridization 

The generation of the SAGE libraries described in Example 7 involved initial in vitro cell 
purification steps that could potentially have altered in vivo gene expression patterns, although 
prior SAGE data from several laboratories suggest that these changes are likely to be minimal 
[Porter et al. (2003a); Porter et al (2003b) Proc. Natl. Acad. Sci USA 100:10931-10936; St. 
Croix et al. (2000) Science 289: 1 197-1202]. Nevertheless, in order to further investigate the 
expression of selected genes at the cellular level in vivo, immunohistochemical and mRNA in 
situ hybridization analyses were performed on a panel of DCIS and invasive breast tumors 
(different from the tumors used for SAGE). In addition, the cell type specificity of some genes 
was verified by RT-PCR in the samples used for SAGE (data not shown). 

Immunohistochemical analysis confirmed that two genes, those encoding IL-lp and 
CCL3 (MIPloc), are highly expressed in leukocytes infiltrating DCIS, but not normal breast 
tissue, whereas the CD45 (PTPRC) pan-leukocyte marker was expressed in both cases. Despite 
the similar number of total leukocytes in invasive tumors the frequency of IL-ip and CCL3 
positive leukocytes, although higher than in normal breast tissue, was much lower than in DCIS, 
suggesting that in situ and invasive breast carcinomas may be immunologically dissimilar. 

mRNA in situ hybridization determined that in DCIS tumors: (a) the expression of PDGF 
(platelet-derived growth factor) rdceptor p-like (PDGFRBL), cathepsin K (CTSK), and CXCL12 
was localized to myofibroblasts as determined by smooth muscle actin (ACTA2) staining; (b) 
CXCL14 was expressed only in myoepithelial cells; (c) TIMP3, cystatin C (CST3) and collagen 
triple helix repeat containing 1 (CTHRC1) were expressed in both myoepithelial cells and 
myofibroblasts. In invasive tumors all these genes were expressed in myo fibroblasts; there are 
no myoepithelial cells in invasive breast tumors. No signal was detected in normal breast tissue 
and with the sense probes (data not shown). Interestingly, although in DCIS tumors CXCL14 
expression was detected only in myoepithelial cells, in some invasive breast carcinomas, while 
present in myofibroblasts, it was much more strongly expressed in tumor epithelial cells (data 
not shown). Similarly, some breast cancer cell lines expressed high levels of CXCL12 or 
CXCL14 in vitro suggesting that during tumor progression a paracrine factor may be converted 
. into an autocrine one due to its up-regulation in the tumor epithelial cells. All the CXCL14 
positive primary breast tumors and even the CXCL1 4 expressing breast cancer cell line 
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(UACC812) were obtained from young, pre-menopausai patients (average age of onset 39 years), 
suggesting a possible association of CXCL14 expression with clinico-pathologic characteristics 
of the tumors. 

Example 9. The effect of CXCL12 and CXCL14 chemokines on breast cancer cells 
The high level of expression of two chemokines, CXCL12 and CXCL14, in 
myoepithelial cells and myofibroblasts, both in DCIS and invasive breast carcinomas, was 
particularly interesting in view of the known function of chemokines as regulators of cell 
proliferation, differentiation, migration, and invasion [Gerard et al. (2001) Nat. Immunol. 2:108- 
115; Muller et al. (2001) Nature 410:50-56; Rossi et al. (2000) Annu. Rev. Immunol. 18:217- 
242]. To determine if CXCL12 and CXCL14 can act as autocrine and/or paracrine factors in 
breast tumors, an analysis to identify cell types expressing receptors for the two chemokines in 
primary breast tissue in vivo was carried out. 

The signaling receptor for CXCL12 is CXCR4, which is known to be expressed in 
various lymphoid cells as well as a variety of epithelial cells [Gerard et al. (2001)]. The 
expression pf CXCR4 in lymphoid and breast epithelial cells was confirmed by 
immunohistochemistry and SAGE data indicated that its expression is increased in invasive 
tumors compared to DCIS and normal breast tissue (data not shown). 

The signaling receptor for CXCL1 4 is unknown but cell surface ligand binding 
experiments have suggested the presence of a putative CXCL14 receptor on monocytes and B- 
cells, suggesting that its receptor is unlikely to be CXCR4 [Kurth et al. (2001) J. Exp. Med; 
194:855-861; Sieeman et al. (2000) Int. Immunol. 12:677-689]. To determine if a CXCL14- 
binding cell surface protein(s) is also present on breast cancer cells, an alkaline phosphatase- 
CXCL14 (AP-CXCL14) fusion protein to be used as a ligand in receptor binding assays was 
generated. In this fusion protein the AP was located N-terminal of the CXCL14. Conditioned 
medium from P-CXCL14- or. control AP-expressing cells was used as an affinity reagent to stain 
normal and cancerous mammary tissue sections. Blue staining indicated the presence of a 
CXCL14 binding protein in certain leukocytes and breast epithelial cells. These findings suggest 
the presence of a cell surface CXCL14 binding protein(s) in cancerous and normal manimary 
epithelial cells and are consistent with a paracrine mechanism of CXCL14 action in the breast. 
To test further the binding characteristics of AP-CXCL14, in vitro ligand binding assays were 
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carried out using various cell lines. Low level AP-CXCL14 binding was detected in all cell lines 
tested including MDA-MB-23 1 and MDA-MB-435 breast cancer and MCF10A immortalized 
mammary epithelial cells (data not shown). To further characterize the AP-CXCL14-putative 

_ _ . CXCL14 receptor interaction, more detailed-binding assays were earned out^ - 

.5 breast cancer cells. Scatchard plot analysis showed two binding slopes in MDA-MB-23 1 cells, 
thereby indicating the presence of high (Kd=6.1xl0" 8 M) and low affinity (Kd=56.7xl(T 8 M) 
binding sites (Fig. 6A). 

In previous studies;, CXCL12 was demonstrated to enhance breast cancer ceil growth, 
migration and invasion [Hall et al. (2003) Mol. Endocrinol. 17:792-803; Muller et al. (2001)] and 

10 it was hypothesized to be involved in metastasis [Kang et al. (2003) Cancer Cell 3:537-549; 
Muller et al. (2001)]. The present demonstration that it is highly expressed in myofibroblasts 
from DCIS, a pre-invasive tumor, indicates that it is likely to have additional roles in earlier 
stages of breast tumorigenesis. hi order to determine if CXCL14 has similar effects, the effect of 
conditioned medium containing AP-CXCL14 on the growth of MDA-MB-23 1 and MCF10A 

15 cells was tested and its effect on cell migration and invasion was investigated using MDA-MB- 
231 cells. Conditioned media of cells transfected with AP alone and CXCL12 were used as 
negative and positive controls, respectively. Similar to CXCL12, AP-CXCL14 enhanced the 
proliferation of MDA-MB-23 1 and MCF10A cells and the migration and invasion of MDA-MB- 
23 1 cells (Figs. 6B and C and data not shown). In these experiments, the concentration of AP- 

20 CXCL14 was 2-30 nM, which is similar to the concentration ranges of several chemokines, 
including CXCL12, required for biological effects. The same results were obtained in cell 
migration and invasion assays using CXCL14-AP (C-terminal AP-tag) and CXCL14-HA (C- 
terminal HA-tag) fusion proteins (Fig. 6C and data not shown). Thus, the observed effects are 
not likely to be due to the position or identity of the epitope tag. Further suggesting that 

25 . mammary epithelia cells have a functional CXCL14 receptor, experiments using recombinant . 
CXCL14 protein and CXCL14 expressing adenovirus demonstrated the induction of calcium 
flux in MDA-MB-23 1 and activation of Akt kinase in MCF10A cells, respectively (data not 
shown). 
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A number of embodiments of the invention have been described. Nevertheless, it will be 
understood that various modifications may be made without departing from the spirit and scope 
of the invention. Accordingly, other embodiments are within the scope of the following claims. 
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